Hi Bryan. I hope that solved it for you. Another think you can do in situations like this is to set the full_ration higher so you can work on the problem. Always set it back to a safe value after the issue is solved.
*ceph pg set_full_ratio 0.98* Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* On Tue, Oct 17, 2017 at 6:52 PM, Bryan Banister <bbanis...@jumptrading.com> wrote: > Thanks for the response, we increased our pg count to something more > reasonable (512 for now) and things are rebalancing. > > > > Cheers, > > -Bryan > > > > *From:* Andreas Calminder [mailto:andreas.calmin...@klarna.com] > *Sent:* Tuesday, October 17, 2017 3:48 PM > *To:* Bryan Banister <bbanis...@jumptrading.com> > *Cc:* Ceph Users <ceph-users@lists.ceph.com> > *Subject:* Re: [ceph-users] Help with full osd and RGW not responsive > > > > *Note: External Email* > ------------------------------ > > Hi, > > You should most definitely look over number of pgs, there's a pg > calculator available here: http://ceph.com/pgcalc/ > > > > You can increase pgs but not the other way around ( > http://docs.ceph.com/docs/jewel/rados/operations/placement-groups/) > > > > To solve the immediate problem with your cluster being full you can > reweight your osds, giving the full osd a lower weight will cause writes > going to other osds and data on that osd being migrated to other osds in > the cluster: ceph osd reweight $OSDNUM $WEIGHT, described here > http://docs.ceph.com/docs/master/rados/operations/control/#osd-subsystem > > > > When the osd isn't above the full threshold, default is 95%, the cluster > will clear its full flag and your radosgw should start accepting write > operations again, at least until another osd gets full, main problem here > is probably the low pg count. > > > > Regards, > > Andreas > > > > On 17 Oct 2017 19:08, "Bryan Banister" <bbanis...@jumptrading.com> wrote: > > Hi all, > > > > Still a real novice here and we didn’t set up our initial RGW cluster very > well. We have 134 osds and set up our RGW pool with only 64 PGs, thus not > all of our OSDs got data and now we have one that is 95% full. > > > > This apparently has put the cluster into a HEALTH_ERR condition: > > [root@carf-ceph-osd01 ~]# ceph health detail > > HEALTH_ERR full flag(s) set; 1 full osd(s); 1 pools have many more objects > per pg than average; application not enabled on 6 pool(s); too few PGs per > OSD (26 < min 30) > > OSDMAP_FLAGS full flag(s) set > > OSD_FULL 1 full osd(s) > > osd.5 is full > > MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average > > pool carf01.rgw.buckets.data objects per pg (602762) is more than > 18.3752 times cluster average (32803) > > > > There is plenty of space on most of the OSDs and don’t know how to go > about fixing this situation. If we update the pg_num and pgp_num settings > for this pool, can we rebalance the data across the OSDs? > > > > Also, seems like this is causing a problem with the RGWs, which was > reporting this error in the logs: > > 2017-10-16 16:36:47.534461 7fffe6c5c700 1 heartbeat_map is_healthy > 'RGWAsyncRadosProcessor::m_tp thread 0x7fffdc447700' had timed out after 600 > > > > After trying to restart the RGW, we see this now: > > 2017-10-17 10:40:38.517002 7fffe6c5c700 1 heartbeat_map is_healthy > 'RGWAsyncRadosProcessor::m_tp thread 0x7fffddc4a700' had timed out after 600 > > 2017-10-17 10:40:42.124046 7ffff7fd4e00 0 deferred set uid:gid to 167:167 > (ceph:ceph) > > 2017-10-17 10:40:42.124162 7ffff7fd4e00 0 ceph version 12.2.0 ( > 32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process > (unknown), pid 65313 > > 2017-10-17 10:40:42.245259 7ffff7fd4e00 0 client.769905.objecter FULL, > paused modify 0x55555662fb00 tid 0 > > 2017-10-17 10:45:42.124283 7fffe7bcf700 -1 Initialization timeout, failed > to initialize > > 2017-10-17 10:45:42.353496 7ffff7fd4e00 0 deferred set uid:gid to 167:167 > (ceph:ceph) > > 2017-10-17 10:45:42.353618 7ffff7fd4e00 0 ceph version 12.2.0 ( > 32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process > (unknown), pid 71842 > > 2017-10-17 10:45:42.388621 7ffff7fd4e00 0 client.769986.objecter FULL, > paused modify 0x55555662fb00 tid 0 > > 2017-10-17 10:50:42.353731 7fffe7bcf700 -1 Initialization timeout, failed > to initialize > > > > Seems pretty evident that the “FULL, paused” is a problem. So if I fix > the first issue the RGW should be ok after? > > > > Thanks in advance, > > -Bryan > > > ------------------------------ > > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ------------------------------ > > Note: This email is for the confidential use of the named addressee(s) > only and may contain proprietary, confidential or privileged information. > If you are not the intended recipient, you are hereby notified that any > review, dissemination or copying of this email is strictly prohibited, and > to please notify the sender immediately and destroy this email and any > attachments. Email transmission cannot be guaranteed to be secure or > error-free. The Company, therefore, does not make any guarantees as to the > completeness or accuracy of this email or any attachments. This email is > for informational purposes only and does not constitute a recommendation, > offer, request or solicitation of any kind to buy, sell, subscribe, redeem > or perform any type of transaction of a financial product. > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com