Hi Bryan.

I hope that solved it for you.
Another think you can do in situations like this is to set the full_ration
higher so you can work on the problem. Always set it back to a safe value
after the issue is solved.

*ceph pg set_full_ratio 0.98*



Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*

On Tue, Oct 17, 2017 at 6:52 PM, Bryan Banister <bbanis...@jumptrading.com>
wrote:

> Thanks for the response, we increased our pg count to something more
> reasonable (512 for now) and things are rebalancing.
>
>
>
> Cheers,
>
> -Bryan
>
>
>
> *From:* Andreas Calminder [mailto:andreas.calmin...@klarna.com]
> *Sent:* Tuesday, October 17, 2017 3:48 PM
> *To:* Bryan Banister <bbanis...@jumptrading.com>
> *Cc:* Ceph Users <ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] Help with full osd and RGW not responsive
>
>
>
> *Note: External Email*
> ------------------------------
>
> Hi,
>
> You should most definitely look over number of pgs, there's a pg
> calculator available here: http://ceph.com/pgcalc/
>
>
>
> You can increase pgs but not the other way around (
> http://docs.ceph.com/docs/jewel/rados/operations/placement-groups/)
>
>
>
> To solve the immediate problem with your cluster being full you can
> reweight your osds, giving the full osd a lower weight will cause writes
> going to other osds and data on that osd being migrated to other osds in
> the cluster: ceph osd reweight $OSDNUM $WEIGHT, described here
> http://docs.ceph.com/docs/master/rados/operations/control/#osd-subsystem
>
>
>
> When the osd isn't above the full threshold, default is 95%, the cluster
> will clear its full flag and your radosgw should start accepting write
> operations again, at least until another osd gets full, main problem here
> is probably the low pg count.
>
>
>
> Regards,
>
> Andreas
>
>
>
> On 17 Oct 2017 19:08, "Bryan Banister" <bbanis...@jumptrading.com> wrote:
>
> Hi all,
>
>
>
> Still a real novice here and we didn’t set up our initial RGW cluster very
> well.  We have 134 osds and set up our RGW pool with only 64 PGs, thus not
> all of our OSDs got data and now we have one that is 95% full.
>
>
>
> This apparently has put the cluster into a HEALTH_ERR condition:
>
> [root@carf-ceph-osd01 ~]# ceph health detail
>
> HEALTH_ERR full flag(s) set; 1 full osd(s); 1 pools have many more objects
> per pg than average; application not enabled on 6 pool(s); too few PGs per
> OSD (26 < min 30)
>
> OSDMAP_FLAGS full flag(s) set
>
> OSD_FULL 1 full osd(s)
>
>     osd.5 is full
>
> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
>
>     pool carf01.rgw.buckets.data objects per pg (602762) is more than
> 18.3752 times cluster average (32803)
>
>
>
> There is plenty of space on most of the OSDs and don’t know how to go
> about fixing this situation.  If we update the pg_num and pgp_num settings
> for this pool, can we rebalance the data across the OSDs?
>
>
>
> Also, seems like this is causing a problem with the RGWs, which was
> reporting this error in the logs:
>
> 2017-10-16 16:36:47.534461 7fffe6c5c700  1 heartbeat_map is_healthy
> 'RGWAsyncRadosProcessor::m_tp thread 0x7fffdc447700' had timed out after 600
>
>
>
> After trying to restart the RGW, we see this now:
>
> 2017-10-17 10:40:38.517002 7fffe6c5c700  1 heartbeat_map is_healthy
> 'RGWAsyncRadosProcessor::m_tp thread 0x7fffddc4a700' had timed out after 600
>
> 2017-10-17 10:40:42.124046 7ffff7fd4e00  0 deferred set uid:gid to 167:167
> (ceph:ceph)
>
> 2017-10-17 10:40:42.124162 7ffff7fd4e00  0 ceph version 12.2.0 (
> 32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process
> (unknown), pid 65313
>
> 2017-10-17 10:40:42.245259 7ffff7fd4e00  0 client.769905.objecter  FULL,
> paused modify 0x55555662fb00 tid 0
>
> 2017-10-17 10:45:42.124283 7fffe7bcf700 -1 Initialization timeout, failed
> to initialize
>
> 2017-10-17 10:45:42.353496 7ffff7fd4e00  0 deferred set uid:gid to 167:167
> (ceph:ceph)
>
> 2017-10-17 10:45:42.353618 7ffff7fd4e00  0 ceph version 12.2.0 (
> 32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process
> (unknown), pid 71842
>
> 2017-10-17 10:45:42.388621 7ffff7fd4e00  0 client.769986.objecter  FULL,
> paused modify 0x55555662fb00 tid 0
>
> 2017-10-17 10:50:42.353731 7fffe7bcf700 -1 Initialization timeout, failed
> to initialize
>
>
>
> Seems pretty evident that the “FULL, paused” is a problem.  So if I fix
> the first issue the RGW should be ok after?
>
>
>
> Thanks in advance,
>
> -Bryan
>
>
> ------------------------------
>
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to