Re: [ceph-users] Ceph RGW Index Sharding In Jewel

Alexandru Cucu Fri, 24 Aug 2018 02:02:56 -0700

You should probably have a look at ceph-ansible as it has a
"take-over-existing-cluster" playbook. I think versions older than 2.0
support Ceph versions older than Jewel.


---
Alex Cucu

On Fri, Aug 24, 2018 at 4:31 AM Russell Holloway
<[email protected]> wrote:
>
> Thanks. Unfortunately even my version of hammer is too old on 0.94.5. I think 
> my only route to address this issue is to figure out the upgrade, at the very 
> least to 0.94.10. The biggest issue again is the deployment tool originally 
> used is set on 0.94.5 and pretty convoluted and no longer receiving updates, 
> but this isn't a ceph issue.
>
> -Russ
>
>
> ________________________________
> From: David Turner <[email protected]>
> Sent: Wednesday, August 22, 2018 11:48 PM
> To: Russell Holloway
> Cc: [email protected]
> Subject: Re: [ceph-users] Ceph RGW Index Sharding In Jewel
>
> The release notes for 0.94.10 mention the introduction of the `radosgw-admin 
> bucket reshard` command. Redhat [1] documentation for their Enterprise 
> version of Jewel goes into detail for the procedure. You can also search the 
> ML archives for the command to find several conversations about the process 
> as well as problems.  Make sure that the procedure works on a test bucket for 
> Hammer before attempting it on your 12M object bucket.
>
>
> [1] 
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_ubuntu/administration_cli#rados_gateway_user_management
>
>
> On Wed, Aug 22, 2018, 9:23 PM Russell Holloway <[email protected]> 
> wrote:
>
> Did I say Jewel? I was too hopeful. I meant hammer. This particular cluster 
> is hammer :(
>
>
> -Russ
>
> ________________________________
> From: ceph-users <[email protected]> on behalf of Russell 
> Holloway <[email protected]>
> Sent: Wednesday, August 22, 2018 8:49:19 PM
> To: [email protected]
> Subject: [ceph-users] Ceph RGW Index Sharding In Jewel
>
>
> So, I've finally journeyed deeper into the depths of ceph and discovered a 
> grand mistake that is likely the root cause of many woeful nights of blocked 
> requests. To start off, I'm running jewel, and I know that is dated and I 
> need to upgrade (if anyone knows if this is a seamless upgrade even though 
> several major versions behind, do let me know.
>
>
> My current issue is due to a rgw bucket index. I have just discovered I have 
> a bucket with about 12M objects in it. Sharding is not enabled on it. And 
> it's on a spinning disk, not SSD (journal is SSD though, so it could be 
> worse?). A bad combination as I just learned. From my recent understanding, 
> in jewel I could maybe update the rgw region to set max shards for buckets, 
> but it also sounds like this may or may not affect my existing bucket. 
> Furthermore, somewhere I saw mention that prior to luminous, resharding 
> needed to be done offline. I haven't found any documentation on this process 
> though. There is some mention around putting bucket indexes on SSD for 
> performance and latency reasons, which sounds great, but I get the feeling if 
> I modified crush map and tried to get the index pool on SSDs, and tried to 
> start moving things around involving this PG, it will fail in the same way I 
> can't even do a deep scrub on the PG.
>
>
> Does anyone have a good reference on how I could begin to clean this bucket 
> up or get it sharded while on jewel? Again, it sounds like in Luminous it may 
> just start resharding itself and fix itself right up, but I feel going to 
> luminous will require more work and testing (mostly due to my original 
> deployment tool Fuel 8 for openstack, bound to jewel, and no easy upgrade 
> path for fuel...I'll have to sort out how to transition away from that while 
> maintaining my existing nodes)
>
>
> The core issue was identified when I took finer grained control over deep 
> scrubs and trigger them manually. I eventually found out I could trigger my 
> entire ceph cluster to hang by triggering a deep scrub on a single PG, which 
> happens to be the one hosting this index. The OSD hosting it basically 
> becomes unresponsive for a very long time and begins blocking a lot of other 
> requests affecting all sorts of VMs using rbd. I could simply not deep scrub 
> this PG (ceph ends up marking OSD as down and deep scrub seems to fail, never 
> completes, and about 30 minutes after hung requests, cluster eventually 
> recovers), but I know I need to address this bucket sizing issue and then try 
> to work on upgrading ceph.
>
>
> Is it doable? For what it's worth, I tried to list the keys in ceph with 
> rados and that also hung requests. I'm not quite sure how to break the bucket 
> up at a software level especially if I cannot list the contents, so I hope 
> within ceph there is some route forward here...
>
>
> Thanks a bunch in advance for helping a naive ceph operator.
>
>
> -Russ
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph RGW Index Sharding In Jewel

Reply via email to