Thanks for reply. In my case, it was an issue about min_size of pool.
# ceph osd pool ls detail pool 5 'volumes' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 844 flags hashpspool stripe_width 0 removed_snaps [1~23] when replicated size=2 and min_size=2 is set, and osd goes down, ceph cluster go into Err state and client I/O goes hang. ceph status log> health HEALTH_ERR 310 pgs are stuck inactive for more than 300 seconds 35 pgs backfill_wait 3 pgs backfilling 38 pgs degraded 382 pgs peering 310 pgs stuck inactive 310 pgs stuck unclean 39 pgs undersized 263 requests are blocked > 32 sec you can simply reproduce that. so I solved this by set min_size=1 using "ceph osd pool set volumes min_size 1" command. It is very strange thing because if min_size can occurs big problem to ceph cluster, ceph would not allow to set same value with replicated_size. Thanks. 2017-08-10 23:33 GMT+09:00 David Turner <drakonst...@gmail.com>: > When the node remote, are the osds being marked down immediately? If the > node were to reboot, but not Mark the osds down, then all requires to those > osds would block until they got marked down. > > On Thu, Aug 10, 2017, 5:46 AM Hyun Ha <hfamil...@gmail.com> wrote: > >> Hi, Ramirez >> >> I have exactly same problem as yours. >> Did you solved that issue? >> Do you have expireences or solutions? >> >> Thank you. >> _______________________________________________ >> ceph-users mailing list >> firstname.lastname@example.org >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list email@example.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com