Hello,
On Tue, 24 May 2016 10:28:02 +0700 Никитенко Виталий wrote:
> Hello!
> I have a cluster of 2 nodes with 3 OSD each. The cluster full about 80%.
>
According to your CRUSH map that's not quite true, namely ceph1-node2
entry.
And while that again according to your CRUSH map isn't in the default root
I wonder WHERE it is and if it confuses Ceph into believing that there is
actually a third node?
"ceph osd tree" output may help, as well as removing ceph1-node2 from the
picture.
> df -H
> /dev/sdc1 27G 24G 3.9G 86% /var/lib/ceph/osd/ceph-1
> /dev/sdd1 27G 20G 6.9G 75% /var/lib/ceph/osd/ceph-2
> /dev/sdb1 27G 24G 3.5G 88% /var/lib/ceph/osd/ceph-0
>
> When I switch off one server, then after 10 minutes begins remapped pgs
>
[snip]
> As a result, one disk overflow and the cluster falls. Why ceph remapped
> pgs, it was supposed to simply mark all pgs as active+degraded, while
> second node down?
>
Yes, I agree, that shouldn't happen with a properly configured 2 node
cluster.
> ceph version 0.80.11
>
Not aware of any bugs in there and in fact I did test a 2 node cluster
with Firefly, but be aware that this version is EoL and no longer receiving
updates.
> root@ceph1-node:~# cat /etc/ceph/ceph.conf
> [global]
> fsid = b66c7daa-d6d8-46c7-9e61-15adbb749ed7
> mon_initial_members = ceph1-node, ceph2-node, ceph-mon2
> mon_host = 192.168.241.97,192.168.241.110,192.168.241.123
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_pool_default_size = 2
> osd_pool_default_min_size = 1
Have you verified (ceph osd get <poolname> size / min_size) that all your
pools are actually set like this?
Regards,
Christian
> mon_clock_drift_allowed = 2
>
>
> root@ceph1-node:~#cat crush-map.txt
> # begin crush
> map tunable choose_local_tries
> 0 tunable choose_local_fallback_tries
> 0 tunable choose_total_tries
> 50 tunable chooseleaf_descend_once
> 1 tunable straw_calc_version
> 1
> #
> devices device 0
> osd.0 device 1
> osd.1 device 2
> osd.2 device 3
> osd.3 device 4
> osd.4 device 5
> osd.5
> #
> types type 0
> osd type 1
> host type 2
> chassis type 3
> rack type 4
> row type 5
> pdu type 6
> pod type 7
> room type 8
> datacenter type 9
> region type 10
> root
> #
> buckets host ceph1-node
> { id -2 # do not change
> unnecessarily # weight
> 0.060 alg
> straw hash 0 #
> rjenkins1 item osd.0 weight
> 0.020 item osd.1 weight
> 0.020 item osd.2 weight
> 0.020 }
>
>
>
> host ceph2-node
> { id -3 # do not change unnecessarily
> # weight 0.060
> alg straw
> hash 0 # rjenkins1
> item osd.3 weight 0.020
> item osd.4 weight 0.020
> item osd.5 weight 0.020
> }
> root default {
> id -1 # do not change unnecessarily
> # weight 0.120
> alg straw
> hash 0 # rjenkins1
> item ceph1-node weight 0.060
> item ceph2-node weight 0.060
> }
> host ceph1-node2 {
> id -4 # do not change unnecessarily
> # weight 3.000
> alg straw
> hash 0 # rjenkins1
> item osd.0 weight 1.000
> item osd.1 weight 1.000
> item osd.2 weight 1.000
> }
>
> # rules
> rule replicated_ruleset {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
> # end crush map
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
Christian Balzer Network/Systems Engineer
[email protected] Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com