Thanks for reply.

In my case, it was an issue about min_size of pool.

# ceph osd pool ls detail
pool 5 'volumes' replicated size 2 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 844 flags hashpspool
stripe_width 0
        removed_snaps [1~23]

when replicated size=2 and min_size=2 is set, and osd goes down, ceph
cluster go into Err state and client I/O goes hang.

ceph status log>
health HEALTH_ERR
            310 pgs are stuck inactive for more than 300 seconds
            35 pgs backfill_wait
            3 pgs backfilling
            38 pgs degraded
            382 pgs peering
            310 pgs stuck inactive
            310 pgs stuck unclean
            39 pgs undersized
            263 requests are blocked > 32 sec

you can simply reproduce that.
so I solved this by set min_size=1 using "ceph osd pool set volumes
min_size 1" command.
It is very strange thing because if min_size can occurs big problem to ceph
cluster, ceph would not allow to set same value with replicated_size.

Thanks.

2017-08-10 23:33 GMT+09:00 David Turner <drakonst...@gmail.com>:

> When the node remote, are the osds being marked down immediately? If the
> node were to reboot, but not Mark the osds down, then all requires to those
> osds would block until they got marked down.
>
> On Thu, Aug 10, 2017, 5:46 AM Hyun Ha <hfamil...@gmail.com> wrote:
>
>> Hi, Ramirez
>>
>> I have exactly same problem as yours.
>> Did you solved that issue?
>> Do you have expireences or solutions?
>>
>> Thank you.
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to