[ceph-users] FW: radosgw: stale/leaked bucket index entries

Pavan Rallabhandi Mon, 19 Jun 2017 22:29:52 -0700

Trying one more time with ceph-users

On 19/06/17, 11:07 PM, "Pavan Rallabhandi" <[email protected]> wrote:


    On many of our clusters running Jewel (10.2.5+), am running into a strange 
problem of having stale bucket index entries left over for (some of the) 
objects deleted. Though it is not reproducible at will, it has been pretty 
consistent of late and am clueless at this point for the possible reasons to 
happen so. 
    
    The symptoms are that the actual delete operation of an object is reported 
successful in the RGW logs, but a bucket list on the container would still show 
the deleted object. An attempt to download/stat of the object appropriately 
results in a failure. No failures are seen in the respective OSDs where the 
bucket index object is located. And rebuilding the bucket index by running 
‘radosgw-admin bucket check –fix’ would fix the issue.
    
    Though I could simulate the problem by instrumenting the code, to not to 
have invoked `complete_del` on the bucket index op 
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L8793, but that 
call is always seem to be made unless there is a cascading error from the 
actual delete operation of the object, which doesn’t seem to be the case here.
    
    I wanted to know the possible reasons where the bucket index would be left 
in such limbo, any pointers would be much appreciated. FWIW, we are not 
sharding the buckets and very recently I’ve seen this happen with buckets 
having as low as 
    < 10 objects, and we are using swift for all the operations.
    
    Thanks,
    -Pavan.
    
    

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] FW: radosgw: stale/leaked bucket index entries

Reply via email to