Greetings,

Background: If an object storage client re-uploads parts to a multipart
object, RadosGW does not clean up all of the parts properly when the
multipart upload is aborted or completed.  You can read all of the gory
details (including reproduction steps) in this bug report:
http://tracker.ceph.com/issues/16767.

My setup: Hammer 0.94.6 cluster only used for S3-compatible object
storage.  RGW stripe size is 4MiB.

My problem: I have buckets that are reporting TB more utilization (and, in
one case, 200k more objects) than they should report.  I am trying to
remove the detritus from the multipart uploads, but removing the leftover
parts directly from the .rgw.buckets pool is having no effect on bucket
utilization (i.e. neither the object count nor the space used are
declining).

To give an example, I have a client that uploaded a very large multipart
object (8000 15MiB parts).  Due to a bug in the client, it uploaded each of
the 8000 parts 6 times.  After the sixth attempt, it gave up and aborted
the upload, at which point RGW removed the 8000 parts from the sixth
attempt.  When I list the bucket's contents with radosgw-admin
(radosgw-admin bucket list --bucket=<bucket> --max-entries=<size of
bucket>), I see all of the object's 8000 parts five separate times, each
under a namespace of 'multipart'.

Since the multipart upload was aborted, I can't remove the object by name
via the S3 interface.  Since my RGW stripe size is 4MiB, I know that each
part of the object will be stored across 4 entries in the .rgw.buckets pool
-- 4 MiB in a 'multipart' file, and 4, 4, and 3 MiB in three successive
'shadow' files.  I've created a script to remove these parts (rados -p
.rgw.buckets rm <bucket_id>__multipart_<object+prefix>.<part> and rados -p
.rgw.buckets rm <bucket_id>__shadow_<object+prefix>.<part>.[1-3]).  The
removes are completing successfully (in that additional attempts to remove
the object result in a failure), but I'm not seeing any decrease in the
bucket's space used, nor am I seeing a decrease in the bucket's object
count.  In fact, if I do another 'bucket list', all of the removed parts
are still included.

I've looked at the output of 'gc list --include-all', and the removed parts
are never showing up for garbage collection.  Garbage collection is
otherwise functioning normally and will successfully remove data for any
object properly removed via the S3 interface.

I've also gone so far as to write a script to list the contents of bucket
shards in the .rgw.buckets.index pool, check for the existence of the entry
in .rgw.buckets, and remove entries that cannot be found, but that is also
failing to decrement the size/object count counters.

What am I missing here?  Where, aside from .rgw.buckets and
.rgw.buckets.index is RGW looking to determine object count and space used
for a bucket?

Many thanks to any and all who can assist.

Brian Felton
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to