For anyone interested or having similar issues, I figured out what was
wrong by running
# radosgw-admin --cluster ceph bucket check
--bucket=12856/weird_bucket --check-objects > obj.check.out

Reviewed the +1M entries in the file, wasn't really sure what the
output was about but figured it was probably objects in the bucket
that was considered broken somehow, running
# radosgw-admin --cluster ceph object stat --bucket=weird_bucket
--object=$OBJECT
on some of these objects returned File not found.

# radosgw-admin --cluster ceph bucket check
--bucket=12856/weird_bucket --check-objects --fix
In hopes that it would fix the index, removing the dead object
entries, but it didn't, not sure why, --fix might be something else,
the --help text just says "besides checking bucket index, will also
fix it" :).

I picked out some dead objects and attached the bucket instance id in
front of it and had the rados command put a dummy file into the
buckets.data pool
# rados -c /etc/ceph/ceph.conf -p ceph.rgw.buckets.data put
be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2384280.20_$OBJECT dummy.file

Lo and behold, the rm command was finally able to remove the objects.
Realizing the stale object entries in the index was closely the same
amount of objects 'bucket stats' reported I gave up on scripting rados
to put dummy files into the stale entries and just went ahead with Red
Hats solution of removing stale buckets
(https://access.redhat.com/solutions/2110551)* since it was just
something like 30 "real" actual objects in the bucket, having these
floating around without a bucket was a lower cost than spending time
on scripting and then removing the bucket.

I'm not sure how I ended up in the state with this many stale entries,
it might have something to do with the user owning this bucket also
had a lot of other bucket indexes that were oversized +6M objects
without index sharding (resharding doesn't work that well, different
thread) in an multi-site environment that had RGW's crashing every now
and then due to memory leak bugs and said oversized indexes being
altered all at the same time

* RH solution article is for Hammer, I'm using Jewel 10.2.7

It was great fun, hope this helps anyone having similar issues.
Cheers!

/andreas

On 8 August 2017 at 12:31, Andreas Calminder
<andreas.calmin...@klarna.com> wrote:
> Hi,
> I'm running into a weird issue while trying to delete a bucket with
> radosgw-admin
>
> #  radosgw-admin --cluster ceph bucket rm --bucket=12856/weird_bucket
> --purge-objects
>
> This returns almost instantly even though the bucket contains +1M
> objects and the bucket isn't removed. Running above command with debug
> flags (--debug-rgw=20 --debug-ms 20)
>
> I notice the session closing down after encountering:
> 2017-08-08 10:51:52.032946 7f8a9caf4700 10 -- CLIENT_IP:0/482026554 >>
> ENDPOINT_IP:6800/5740 pipe(0x7f8ac2acc8c0 sd=7 :3482 s=2 pgs=7856733
> cs=1 l=1 c=0x7f8ac2acb3a0).reader got message 8 0x7f8a64001640
> osd_op_reply(218
> be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2384280.20_a_weird_object
> [getxattrs,stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v7
> 2017-08-08 10:51:52.032970 7f8a9caf4700  1 -- CLIENT_IP:0/482026554
> <== osd.47 ENDPOINT_IP:6800/5740 8 ==== osd_op_reply(218
> be8fa19b-ad79-4cd8-ac7b-1e14fdc882f6.2384280.20_a_weird_object
> [getxattrs,stat] v0'0 uv0 ack = -2 ((2) No such file or directory)) v7
> ==== 317+0+0 (3298345941 0 0) 0x7f8a64001640 con 0x7f8ac2acb3a0
>
> If I understand the output correctly, the file wasn't found and the
> session was closed down. The radosgw-admin command doesn't hint that
> anything bad has happened though.
>
> Anyone seen this behaviour or anything similar? Any pointers of how to
> fix it, I just want to get rid of the bucket since it's both
> over-sized and unused.
>
> Best regards,
> Andreas
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to