[ceph-users] Understanding reshard issues

Graham Allan Wed, 13 Dec 2017 11:51:18 -0800

After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the sameissues reported earlier on the list under "rgw resharding operationseemingly won't end". Some buckets were automatically added to thereshard list, and something happened overnight such that they couldn'tbe written to. A couple of our radosgw nodes hung due to inadequatelimits on file handles, so that might possibly have been a cause.

I was able to correct the buckets using "radosgw-admin bucket check--fix" command, and later disabled the auto resharding.

As an experiment, I selected an unsharded bucket to attempt a manualreshard. I added it the reshard list ,then ran "radosgw-admin reshardexecute". The bucket in question contains 184000 objects and was beingconverted from 1 to 3 shards.


I'm trying to understand what I found...

1) the "radosgw-admin reshard execute" never returned. Somehow Iexpected it to kick off a background operation, but possibly this wasmistaken.

2) After 2 days it was still running. Is there any way to checkprogress? Such as querying something about the "new_bucket_instance_id"reported by "reshard status"?

3) When I tested uploading an object to the bucket I got an error - theclient reported response code "UnknownError" - while radosgw logged:

2017-12-13 10:56:44.486131 7f02b2985700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2017-12-13 10:56:44.488657 7f02b2985700  0 NOTICE: resharding operation on 
bucket index detected, blocking

But the introduction to dynamic resharding says that "there is no needto stop IO operations that go to the bucket (although some concurrentoperations may experience additional latency when resharding is inprogress)" - so I feel sure something must be wrong here.

I'd like to get a feel for how long it might take to reshard a smallishbucket of this sort, and whether it can be done without making itunwriteable, before considering how to handle our older and morepathological buckets (multi-million objects in a single shard).


Thanks for any pointers,

Graham
--
Graham Allan
Minnesota Supercomputing Institute - [email protected]
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Understanding reshard issues

Reply via email to