[ceph-users] Re: RGW/S3 losing multipart upload objects

Ulrich Klein Thu, 17 Mar 2022 07:48:00 -0700

Yo, that one is one of the threads that looks very similar to my problem, just 
with no resolution for me. 
I have a multi-site setup, so no resharding. Tried it — setup RGW from scratch 
due to all the “funny” errors.


So, hoping for 16.2.7 or the “in the works” fix :)

Ciao, Uli 

> On 17. 03 2022, at 15:34, Matt Benjamin <[email protected]> wrote:
> 
> Thanks, Soumya.
> 
> It's also possible that what's reproducing is the known (space) leak
> during re-upload of multipart parts, described here:
> https://tracker.ceph.com/issues/44660.
> A fix for this is being worked on, it's taking a while.
> 
> Matt
> 
> On Thu, Mar 17, 2022 at 10:31 AM Soumya Koduri <[email protected]> wrote:
>> 
>> On 3/17/22 17:16, Ulrich Klein wrote:
>>> Hi,
>>> 
>>> My second attempt to get help with a problem I'm trying to solve for about 
>>> 6 month now.
>>> 
>>> I have a Ceph 16.2.6 test cluster, used almost exclusively for providing 
>>> RGW/S3 service. similar to a production cluster.
>>> 
>>> The problem I have is this:
>>> A client uploads (via S3) a bunch of large files into a bucket via 
>>> multiparts
>>> The upload(s) get interrupted and retried
>>> In the end from a client's perspective all the files are visible and 
>>> everything looks fine.
>>> But on the cluster there are many more objects in the buckets
>>> Even after cleaning out the incomplete multipart uploads there are too many 
>>> objects
>>> Even after deleting all the visible objects from the bucket there are still 
>>> objects in the bucket
>>> I have so far found no way to get rid of those left-over objects.
>>> It's screwing up space accounting and I'm afraid I'll eventually have a 
>>> cluster full of those lost objects.
>>> The only way to clean up seems to be to copy te contents of a bucket to a 
>>> new bucket and delete the screwed-up bucket. But on a production system 
>>> that's not always a real option.
>>> 
>>> I've found a variety of older threads that describe a similar problem. None 
>>> of them decribing a solution :(
>>> 
>>> 
>>> 
>>> I can pretty easily reproduce the problem with this sequence:
>>> 
>>> On a client system create a directory with ~30 200MB files. (On a faster 
>>> system I'd probably need bigger or more files)
>>> tstfiles/tst01 - tst29
>>> 
>>> run
>>> $ rclone mkdir tester:/test-bucket # creates a bucket on the test system 
>>> with user tester
>>> Run
>>> $ rclone sync -v tstfiles tester:/test-bucket/tstfiles
>>> a couple of times (6-8), interrupting each one via CNTRL-C
>>> Eventually let one finish.
>>> 
>>> Now I can use s3cmd to see all the files:
>>> $ s3cmd ls -lr s3://test-bucket/tstfiles
>>> 2022-03-16 17:11   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     
>>> s3://test-bucket/tstfiles/tst01
>>> ...
>>> 2022-03-16 17:13   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     
>>> s3://test-bucket/tstfiles/tst29
>>> 
>>> ... and to list incomplete uploads:
>>> $ s3cmd multipart s3://test-bucket
>>> s3://test-bucket/
>>> Initiated     Path    Id
>>> 2022-03-16T17:11:19.074Z      s3://test-bucket/tstfiles/tst05 
>>> 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>>> ...
>>> 2022-03-16T17:12:41.583Z      s3://test-bucket/tstfiles/tst28 
>>> 2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa
>>> 
>>> I can abort the uploads with
>>> $  s3cmd abortmp s3://test-bucket/tstfiles/tst05 
>>> 2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
>>> ...
>> 
>> 
>> 
>> On the latest master, I see that these objects are deleted immediately
>> post abortmp. I believe this issue may have beenn fixed as part of [1],
>> backported to v16.2.7 [2]. Maybe you could try upgrading your cluster
>> and recheck.
>> 
>> 
>> Thanks,
>> 
>> Soumya
>> 
>> 
>> [1] https://tracker.ceph.com/issues/53222
>> 
>> [2] https://tracker.ceph.com/issues/53291
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> 
> 
> 
> -- 
> 
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://www.redhat.com/en/technologies/storage
> 
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
> 
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: RGW/S3 losing multipart upload objects

Reply via email to