Boris,

To check if your issue is related to Rafael's, could you check your
access logs for requests on the missing objects which lasted longer
than one hour?

I ask because Nautilus also has rgw_gc_obj_min_wait (2hr by default),
which is the main config option related to
https://tracker.ceph.com/issues/47866


-- Dan

On Thu, Jul 22, 2021 at 11:12 AM Dan van der Ster <[email protected]> wrote:
>
> Hi Rafael,
>
> AFAIU, that gc issue was not relevant for N -- the bug is in the new
> rgw_gc code which landed in Octopus and was not backported to N.
>
> Well, RHCEPH had the new rgw_gc cls backported to it, and RHCEPH has
> the bugfix you refer to:
> * Wed Dec 02 2020 Ceph Jenkins <[email protected]> 2:14.2.11-86
> - rgw: during GC defer, prevent new GC enqueue (rhbz#1892644)
> https://bugzilla.redhat.com/show_bug.cgi?id=1892644
>
> But still, I think it shouldn't apply to the upstream community
> Nautilus that we run.
>
> That said, this indeed looks really similar so perhaps Nautilus has
> similar faulty gc logic.
>
> Cheers, Dan
>
> On Thu, Jul 22, 2021 at 6:47 AM Rafael Lopez <[email protected]> wrote:
> >
> > hi boris,
> >
> > We hit an issue late last year that sounds similar to what you are 
> > experiencing. I am not sure if the fix was backported to nautilus, I can't 
> > see any reference to a nautilus backport so it's possible it was only 
> > backported to octopus (15.x), exception being red hat ceph nautilus.
> >
> > https://tracker.ceph.com/issues/47866?next_issue_id=48255#note-59
> > https://www.mail-archive.com/[email protected]/msg05312.html
> >
> > Basically, a read request on a s3/swift object that took a very long time 
> > to complete would cause the associated rados data objects to be put in the 
> > GC queue, but the head object would still be present. So the s3 object 
> > would still show as present, `rados bi list` would show it (since head 
> > object was present) but the data objects would be gone, resulting in 404 
> > NoSuchKey when retrieving the object.
> >
> > raf
> >
> > On Wed, 21 Jul 2021 at 18:12, Boris Behrens <[email protected]> wrote:
> >>
> >> Good morning everybody,
> >>
> >> we've dug further into it but still don't know how this could happen.
> >> What we ruled out for now:
> >> * Orphan objects cleanup process.
> >> ** There is only one bucket with missing data (I checked all other
> >> buckets yesterday)
> >> ** The "keep this files" list is generated by radosgw-admin bukcet
> >> rados list. I would doubt that there were files listed, that are
> >> accessible via radosgw
> >> ** The deleted files are somewhat random, but always with their
> >> corresponding counterparts (per folder there are 2-3 files that belong
> >> together)
> >>
> >> * Customer remove his data, but radosgw didn't clean up the bucket index
> >> ** there are no delete requests in the buckets usage log.
> >> ** customer told us, that they do not have a delete job for this bucket
> >>
> >> So I am lost with ideas that I could check, and hope that you people
> >> might be able to help with further ideas.
> >>
> >>
> >>
> >>
> >> --
> >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> >> im groüen Saal.
> >> _______________________________________________
> >> ceph-users mailing list -- [email protected]
> >> To unsubscribe send an email to [email protected]
> >
> >
> >
> > --
> > Rafael Lopez
> > Devops Systems Engineer
> > Monash University eResearch Centre
> >
> > E: [email protected]
> >
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to