Trying one more time with ceph-users
On 19/06/17, 11:07 PM, "Pavan Rallabhandi" <[email protected]> wrote:
On many of our clusters running Jewel (10.2.5+), am running into a strange
problem of having stale bucket index entries left over for (some of the)
objects deleted. Though it is not reproducible at will, it has been pretty
consistent of late and am clueless at this point for the possible reasons to
happen so.
The symptoms are that the actual delete operation of an object is reported
successful in the RGW logs, but a bucket list on the container would still show
the deleted object. An attempt to download/stat of the object appropriately
results in a failure. No failures are seen in the respective OSDs where the
bucket index object is located. And rebuilding the bucket index by running
‘radosgw-admin bucket check –fix’ would fix the issue.
Though I could simulate the problem by instrumenting the code, to not to
have invoked `complete_del` on the bucket index op
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L8793, but that
call is always seem to be made unless there is a cascading error from the
actual delete operation of the object, which doesn’t seem to be the case here.
I wanted to know the possible reasons where the bucket index would be left
in such limbo, any pointers would be much appreciated. FWIW, we are not
sharding the buckets and very recently I’ve seen this happen with buckets
having as low as
< 10 objects, and we are using swift for all the operations.
Thanks,
-Pavan.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com