Thanks for reply.
In my case, it was an issue about min_size of pool.
# ceph osd pool ls detail
pool 5 'volumes' replicated size 2 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 844 flags hashpspool
stripe_width 0
removed_snaps [1~23]
when replicated size=2 and min_size=2 is set, and osd goes down, ceph
cluster go into Err state and client I/O goes hang.
ceph status log>
health HEALTH_ERR
310 pgs are stuck inactive for more than 300 seconds
35 pgs backfill_wait
3 pgs backfilling
38 pgs degraded
382 pgs peering
310 pgs stuck inactive
310 pgs stuck unclean
39 pgs undersized
263 requests are blocked > 32 sec
you can simply reproduce that.
so I solved this by set min_size=1 using "ceph osd pool set volumes
min_size 1" command.
It is very strange thing because if min_size can occurs big problem to ceph
cluster, ceph would not allow to set same value with replicated_size.
Thanks.
2017-08-10 23:33 GMT+09:00 David Turner <[email protected]>:
> When the node remote, are the osds being marked down immediately? If the
> node were to reboot, but not Mark the osds down, then all requires to those
> osds would block until they got marked down.
>
> On Thu, Aug 10, 2017, 5:46 AM Hyun Ha <[email protected]> wrote:
>
>> Hi, Ramirez
>>
>> I have exactly same problem as yours.
>> Did you solved that issue?
>> Do you have expireences or solutions?
>>
>> Thank you.
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com