cluster_network goes slow during erasure code pool's stress testing

huang jun Mon, 21 Dec 2015 07:22:17 -0800

hi,all
We meet a problem related to erasure pool with k:m=3:1 and stripe_unit=64k*3.
We have a cluster with 96 OSDs on 4 Hosts(hosts are: srv1, srv2, srv3,
srv4), each host have 24 OSDs,
each host have 12 core processors (Intel(R) Xeon(R) CPU E5-2620 v2 @
2.10GHz) and 48GB memory.
cluster configured with(both are 10GB ethernet):
cluster_network = 172.19.0.0/16
public_network = 192.168.0.0/16


Test suite below:
1) on each host, mount a Kernel client which bind to erasue pool
2) on each host, configure a smb server which use cephfs's mount point
3) every samba server have a windows smb client, which doing file
write\read\delete operations
4) every kernel client, we run a test shell script, write a 5GB file
recursivelly and create many dirs.

we run the test at 6:00 pm, but the second day morning, the cluster is broken,
1) there are 48 ODSs down, on srv1 and srv4
2) i check the down OSD's log, there are two kinds of log:
a) many osds down due to Filestore::op_thread timeout suicide
b) many osds down due to OSD::osd_op_tp timeout suicide

Because we have met this problem before, we use iperf to check the
network between srv1 and srv4;
the public_network is fine, the throughput can reach 9.20 Gbits/sec.
but the cluster_network performs bad from srv1 to srv4;
"iperf -c 172.19.10.4 " shows:
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-79.6 sec   384 KBytes  39.5 Kbits/sec

**but** iperf test from srv4 to srv1 is ok.

**note**:
a) at this time, there are no ceph-osd daemons on srv1 and srv4
b) after restart the network, iperf test on all sides shows ok

If the network is so slow, the osd_op_tp can be stucked in
submit_message if the reader is reciving data,
which can finally result the osd_op_tp thread suicide.

And we have another cluster with the same configuration,and run the
same tests, the **only** difference is
this cluster is testing replicated pool, not erasure pool.

why the network is so slow, bc the erasure pool use more cpu and mem
than replicated pool?

Any hints and tips are welcome.


-- 
thanks
huangjun
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cluster_network goes slow during erasure code pool's stress testing

Reply via email to