You should probably have a look at ceph-ansible as it has a "take-over-existing-cluster" playbook. I think versions older than 2.0 support Ceph versions older than Jewel.
--- Alex Cucu On Fri, Aug 24, 2018 at 4:31 AM Russell Holloway <[email protected]> wrote: > > Thanks. Unfortunately even my version of hammer is too old on 0.94.5. I think > my only route to address this issue is to figure out the upgrade, at the very > least to 0.94.10. The biggest issue again is the deployment tool originally > used is set on 0.94.5 and pretty convoluted and no longer receiving updates, > but this isn't a ceph issue. > > -Russ > > > ________________________________ > From: David Turner <[email protected]> > Sent: Wednesday, August 22, 2018 11:48 PM > To: Russell Holloway > Cc: [email protected] > Subject: Re: [ceph-users] Ceph RGW Index Sharding In Jewel > > The release notes for 0.94.10 mention the introduction of the `radosgw-admin > bucket reshard` command. Redhat [1] documentation for their Enterprise > version of Jewel goes into detail for the procedure. You can also search the > ML archives for the command to find several conversations about the process > as well as problems. Make sure that the procedure works on a test bucket for > Hammer before attempting it on your 12M object bucket. > > > [1] > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_ubuntu/administration_cli#rados_gateway_user_management > > > On Wed, Aug 22, 2018, 9:23 PM Russell Holloway <[email protected]> > wrote: > > Did I say Jewel? I was too hopeful. I meant hammer. This particular cluster > is hammer :( > > > -Russ > > ________________________________ > From: ceph-users <[email protected]> on behalf of Russell > Holloway <[email protected]> > Sent: Wednesday, August 22, 2018 8:49:19 PM > To: [email protected] > Subject: [ceph-users] Ceph RGW Index Sharding In Jewel > > > So, I've finally journeyed deeper into the depths of ceph and discovered a > grand mistake that is likely the root cause of many woeful nights of blocked > requests. To start off, I'm running jewel, and I know that is dated and I > need to upgrade (if anyone knows if this is a seamless upgrade even though > several major versions behind, do let me know. > > > My current issue is due to a rgw bucket index. I have just discovered I have > a bucket with about 12M objects in it. Sharding is not enabled on it. And > it's on a spinning disk, not SSD (journal is SSD though, so it could be > worse?). A bad combination as I just learned. From my recent understanding, > in jewel I could maybe update the rgw region to set max shards for buckets, > but it also sounds like this may or may not affect my existing bucket. > Furthermore, somewhere I saw mention that prior to luminous, resharding > needed to be done offline. I haven't found any documentation on this process > though. There is some mention around putting bucket indexes on SSD for > performance and latency reasons, which sounds great, but I get the feeling if > I modified crush map and tried to get the index pool on SSDs, and tried to > start moving things around involving this PG, it will fail in the same way I > can't even do a deep scrub on the PG. > > > Does anyone have a good reference on how I could begin to clean this bucket > up or get it sharded while on jewel? Again, it sounds like in Luminous it may > just start resharding itself and fix itself right up, but I feel going to > luminous will require more work and testing (mostly due to my original > deployment tool Fuel 8 for openstack, bound to jewel, and no easy upgrade > path for fuel...I'll have to sort out how to transition away from that while > maintaining my existing nodes) > > > The core issue was identified when I took finer grained control over deep > scrubs and trigger them manually. I eventually found out I could trigger my > entire ceph cluster to hang by triggering a deep scrub on a single PG, which > happens to be the one hosting this index. The OSD hosting it basically > becomes unresponsive for a very long time and begins blocking a lot of other > requests affecting all sorts of VMs using rbd. I could simply not deep scrub > this PG (ceph ends up marking OSD as down and deep scrub seems to fail, never > completes, and about 30 minutes after hung requests, cluster eventually > recovers), but I know I need to address this bucket sizing issue and then try > to work on upgrading ceph. > > > Is it doable? For what it's worth, I tried to list the keys in ceph with > rados and that also hung requests. I'm not quite sure how to break the bucket > up at a software level especially if I cannot list the contents, so I hope > within ceph there is some route forward here... > > > Thanks a bunch in advance for helping a naive ceph operator. > > > -Russ > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
