Boris, To check if your issue is related to Rafael's, could you check your access logs for requests on the missing objects which lasted longer than one hour?
I ask because Nautilus also has rgw_gc_obj_min_wait (2hr by default), which is the main config option related to https://tracker.ceph.com/issues/47866 -- Dan On Thu, Jul 22, 2021 at 11:12 AM Dan van der Ster <[email protected]> wrote: > > Hi Rafael, > > AFAIU, that gc issue was not relevant for N -- the bug is in the new > rgw_gc code which landed in Octopus and was not backported to N. > > Well, RHCEPH had the new rgw_gc cls backported to it, and RHCEPH has > the bugfix you refer to: > * Wed Dec 02 2020 Ceph Jenkins <[email protected]> 2:14.2.11-86 > - rgw: during GC defer, prevent new GC enqueue (rhbz#1892644) > https://bugzilla.redhat.com/show_bug.cgi?id=1892644 > > But still, I think it shouldn't apply to the upstream community > Nautilus that we run. > > That said, this indeed looks really similar so perhaps Nautilus has > similar faulty gc logic. > > Cheers, Dan > > On Thu, Jul 22, 2021 at 6:47 AM Rafael Lopez <[email protected]> wrote: > > > > hi boris, > > > > We hit an issue late last year that sounds similar to what you are > > experiencing. I am not sure if the fix was backported to nautilus, I can't > > see any reference to a nautilus backport so it's possible it was only > > backported to octopus (15.x), exception being red hat ceph nautilus. > > > > https://tracker.ceph.com/issues/47866?next_issue_id=48255#note-59 > > https://www.mail-archive.com/[email protected]/msg05312.html > > > > Basically, a read request on a s3/swift object that took a very long time > > to complete would cause the associated rados data objects to be put in the > > GC queue, but the head object would still be present. So the s3 object > > would still show as present, `rados bi list` would show it (since head > > object was present) but the data objects would be gone, resulting in 404 > > NoSuchKey when retrieving the object. > > > > raf > > > > On Wed, 21 Jul 2021 at 18:12, Boris Behrens <[email protected]> wrote: > >> > >> Good morning everybody, > >> > >> we've dug further into it but still don't know how this could happen. > >> What we ruled out for now: > >> * Orphan objects cleanup process. > >> ** There is only one bucket with missing data (I checked all other > >> buckets yesterday) > >> ** The "keep this files" list is generated by radosgw-admin bukcet > >> rados list. I would doubt that there were files listed, that are > >> accessible via radosgw > >> ** The deleted files are somewhat random, but always with their > >> corresponding counterparts (per folder there are 2-3 files that belong > >> together) > >> > >> * Customer remove his data, but radosgw didn't clean up the bucket index > >> ** there are no delete requests in the buckets usage log. > >> ** customer told us, that they do not have a delete job for this bucket > >> > >> So I am lost with ideas that I could check, and hope that you people > >> might be able to help with further ideas. > >> > >> > >> > >> > >> -- > >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend > >> im groüen Saal. > >> _______________________________________________ > >> ceph-users mailing list -- [email protected] > >> To unsubscribe send an email to [email protected] > > > > > > > > -- > > Rafael Lopez > > Devops Systems Engineer > > Monash University eResearch Centre > > > > E: [email protected] > > _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
