Hi Janne and others, We used the “ceph osd reweight-by-utilization “ command to move a small amount of data off of the top four OSDs by utilization. Then we updated the pg_num and pgp_num on the pool from 512 to 1024 which started moving roughly 50% of the objects around as a result. The unfortunate issue is that the weights on the OSDs are still roughly equivalent and the OSDs that are nearfull were still getting allocated objects during the rebalance backfill operations.
At this point I have made some massive changes to the weights of the OSDs in an attempt to stop Ceph from allocating any more data to OSDs that are getting close to full. Basically the OSD with the lowest utilization remains weighted at 1 and the rest of the OSDs are now reduced in weight based on the percent usage of the OSD + the %usage of the OSD with the amount of data (21% at the time). This means the OSD that is at the most full at this time at 86% full now has a weight of only .33 (it was at 89% when reweight was applied). I’m not sure this is a good idea, but it seemed like the only option I had. Please let me know if I’m making a bad situation worse! I still have the question on how this happened in the first place and how to prevent it from happening going forward without a lot of monitoring and reweighting on weekends/etc to keep things balanced. It sounds like Ceph is really expecting that objects stored into a pool will roughly have the same size, is that right? Our backups going into this pool have very large variation in size, so would it be better to create multiple pools based on expected size of objects and then put backups of similar size into each pool? The backups also have basically the same names with the only difference being the date which it was taken (e.g. backup name difference in subsequent days can be one digit at times), so does this mean that large backups with basically the same name will end up being placed in the same PGs based on the CRUSH calculation using the object name? Thanks, -Bryan From: Janne Johansson [mailto:icepic...@gmail.com] Sent: Wednesday, January 31, 2018 9:34 AM To: Bryan Banister <bbanis...@jumptrading.com> Cc: Ceph Users <email@example.com> Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2 Note: External Email ________________________________ 2018-01-31 15:58 GMT+01:00 Bryan Banister <bbanis...@jumptrading.com<mailto:bbanis...@jumptrading.com>>: Given that this will move data around (I think), should we increase the pg_num and pgp_num first and then see how it looks? I guess adding pgs and pgps will move stuff around too, but if the PGCALC formula says you should have more then that would still be a good start. Still, a few manual reweights first to take the 85-90% ones down might be good, some move operations are going to refuse adding things to too-full OSDs, so you would not want to get accidentally bumped above such a limit due to some temp-data being created during moves. Also, dont bump pgs like crazy, you can never move down. Aim for getting ~100 per OSD at most, and perhaps even then in smaller steps so that the creation (and evening out of data to the new empty PGs) doesn't kill normal client I/O perf in the meantime. -- May the most significant bit of your life be positive. ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________ ceph-users mailing list firstname.lastname@example.org http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com