Re: [ceph-users] RGW problems after upgrade to Luminous

David Turner Fri, 03 Aug 2018 10:54:29 -0700

I came across you mentioning bucket check --fix before, but I totally
forgot that I should be passing --bucket=mybucket with the command to
actually do anything.  I'm running this now and it seems to actually be
doing something.  My guess was that it was stuck in the state and now that
I can clean up the bucket I should be able to try resharding it again.
Thank you so much.


On Fri, Aug 3, 2018 at 12:50 PM Yehuda Sadeh-Weinraub <[email protected]>
wrote:

> Oh, also -- one thing that might work is running bucket check --fix on
> the bucket. That should overwrite the reshard status field in the
> bucket index.
>
> Let me know if it happens to fix the issue for you.
>
> Yehuda.
>
> On Fri, Aug 3, 2018 at 9:46 AM, Yehuda Sadeh-Weinraub <[email protected]>
> wrote:
> > Is it actually resharding, or is it just stuck in that state?
> >
> > On Fri, Aug 3, 2018 at 7:55 AM, David Turner <[email protected]>
> wrote:
> >> I am currently unable to write any data to this bucket in this current
> >> state.  Does anyone have any ideas for reverting to the original index
> >> shards and cancel the reshard processes happening to the bucket?
> >>
> >> On Thu, Aug 2, 2018 at 12:32 PM David Turner <[email protected]>
> wrote:
> >>>
> >>> I upgraded my last cluster to Luminous last night.  It had some very
> large
> >>> bucket indexes on Jewel which caused a couple problems with the
> upgrade, but
> >>> finally everything finished and we made it to the other side, but now
> I'm
> >>> having problems with [1] these errors populating a lot of our RGW logs
> and
> >>> clients seeing the time skew error responses.  The time stamps between
> the
> >>> client nodes, rgw nodes, and the rest of the ceph cluster match
> perfectly
> >>> and actually build off of the same ntp server.
> >>>
> >>> I tried disabling dynamic resharding for the RGW daemons by placing
> this
> >>> in the ceph.conf for the affected daemons `rgw_dynamic_resharding =
> false`
> >>> and restarting them as well as issuing a reshard cancel for the
> bucket, but
> >>> nothing seems to actually stop the reshard from processing.  Here's the
> >>> output of a few commands.  [2] reshard list [3] reshard status
> >>>
> >>> Are there any things we can do to actually disable bucket resharding or
> >>> let it finish?  I'm stuck on ideas.  I've tried quite a few things I've
> >>> found around except for manually resharding which is a last resort
> here.
> >>> This bucket won't exist in a couple months and the performance is good
> >>> enough without resharding, but I don't know how to get it to stop.
> Thanks.
> >>>
> >>>
> >>> [1] 2018-08-02 16:22:16.047387 7fbe82e61700  0 NOTICE: resharding
> >>> operation on bucket index detected, blocking
> >>> 2018-08-02 16:22:16.206950 7fbe8de77700  0 block_while_resharding
> ERROR:
> >>> bucket is still resharding, please retry
> >>> 2018-08-02 16:22:12.253734 7fbe4f5fa700  0 NOTICE: request time skew
> too
> >>> big now=2018-08-02 16:22:12.000000 req_time=2018-08-02 16:06:03.000000
> >>>
> >>> [2] $ radosgw-admin reshard list
> >>> [2018-08-02 16:13:19.082172 7f3ca4163c80 -1 ERROR: failed to list
> reshard
> >>> log entries, oid=reshard.0000000010
> >>> 2018-08-02 16:13:19.082757 7f3ca4163c80 -1 ERROR: failed to list
> reshard
> >>> log entries, oid=reshard.0000000011
> >>> 2018-08-02 16:13:19.083941 7f3ca4163c80 -1 ERROR: failed to list
> reshard
> >>> log entries, oid=reshard.0000000012
> >>> 2018-08-02 16:13:19.085170 7f3ca4163c80 -1 ERROR: failed to list
> reshard
> >>> log entries, oid=reshard.0000000013
> >>> 2018-08-02 16:13:19.085898 7f3ca4163c80 -1 ERROR: failed to list
> reshard
> >>> log entries, oid=reshard.0000000014
> >>> ]
> >>> 2018-08-02 16:13:19.086476 7f3ca4163c80 -1 ERROR: failed to list
> reshard
> >>> log entries, oid=reshard.0000000015
> >>>
> >>> [3] $ radosgw-admin reshard status --bucket my-bucket
> >>> [
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     },
> >>>     {
> >>>         "reshard_status": 1,
> >>>         "new_bucket_instance_id":
> >>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",
> >>>         "num_shards": 32
> >>>     }
> >>> ]
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW problems after upgrade to Luminous

Reply via email to