Hi Andreas, Interesting as we are also on Jewel 10.2.7. We do care about the data in the bucket so we really need the reshard process to run properly :). Could you maybe share how you linked the bucket to the new index by hand? That would already give me some extra insight. Thanks!
Regards, Maarten On Wed, Jul 5, 2017 at 10:21 AM, Andreas Calminder < [email protected]> wrote: > Hi, > I had a similar problem while resharding an oversized non-sharded > bucket in Jewel (10.2.7), the bi_list exited with ERROR: bi_list(): > (4) Interrupted system call at, what seemed like the very end of the > operation. I went ahead and resharded the bucket anyway and the > reshard process ended the same way, seemingly at the end. Reshard > didn't link the bucket to new instance id though so I had to do that > by hand and then purge the index from the old instance id. > Note that I didn't care about the data in the bucket, I just wanted to > reshard the index so I could delete the bucket without my radosgw and > osds crashing due to out of memory issues. > > Regards, > Andreas > > On 4 July 2017 at 20:46, Maarten De Quick <[email protected]> wrote: > > Hi, > > > > Background: We're having issues with our index pool (slow requests / time > > outs causes crashing of an OSD and a recovery -> application issues). We > > know we have very big buckets (eg. bucket of 77 million objects with > only 16 > > shards) that need a reshard so we were looking at the resharding process. > > > > First thing we would like to do is making a backup of the bucket index, > but > > this failed with: > > > > # radosgw-admin -n client.radosgw.be-west-3 bi list > > --bucket=priv-prod-up-alex > /var/backup/priv-prod-up-alex.list.backup > > 2017-07-03 21:28:30.325613 7f07fb8bc9c0 0 System already converted > > ERROR: bi_list(): (4) Interrupted system call > > > > When I grep for "idx" and I count these: > > # grep idx priv-prod-up-alex.list.backup | wc -l > > 2294942 > > When I do a bucket stats for that bucket I get: > > # radosgw-admin -n client.radosgw.be-west-3 bucket stats > > --bucket=priv-prod-up-alex | grep num_objects > > 2017-07-03 21:33:05.776499 7faca49b89c0 0 System already converted > > "num_objects": 20148575 > > > > It looks like there are 18 million objects missing and the backup is not > > complete (not sure if that's a correct assumption?). We're also afraid > that > > the resharding command will face the same issue. > > Has anyone seen this behaviour before or any thoughts on how to fix it? > > > > We were also wondering if we really need the backup. As the resharding > > process creates a complete new index and keeps the old bucket, is there > > maybe a possibility to relink your bucket to the old bucket in case of > > issues? Or am I missing something important here? > > > > Any help would be greatly appreciated, thanks! > > > > Regards, > > Maarten > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
