Hi Vickey,
I had this exact same problem last week, resolved by rebooting all of my
OSD nodes. I have yet to figure out why it happened, though. I _suspect_
in my case it's due to a failing controller on a particular box I've had
trouble with in the past.
I tried setting 'noout', stopping my OSDs one host at a time, then
rerunning RADOS bench between to see if I could nail down the
problematic machine. Depending on your # of hosts, this might work for
you. Admittedly, I got impatient with this approach though and just
ended up restarting everything (which worked!) :)
If you have a bunch of blocked ops, you could maybe try a 'pg query' on
the PGs involved and see if there's a common OSD with all of your
blocked ops. In my experience, it's not necessarily the one reporting.
Anecdotally, I've had trouble with Intel 10Gb NICs and custom kernels as
well. I've seen a NIC appear to be happy (no message in dmesg, machine
appears to be communicating normally, etc) but when I went to iperf it,
I was getting super pitiful performance (like KB/s). I don't know what
kind of NICs you're using, but you may want to iperf everything just in
case.
--Lincoln
On 9/7/2015 9:36 AM, Vickey Singh wrote:
Dear Experts
Can someone please help me , why my cluster is not able write data.
See the below output cur MB/S is 0 and Avg MB/s is decreasing.
Ceph Hammer 0.94.2
CentOS 6 (3.10.69-1)
The Ceph status says OPS are blocked , i have tried checking , what all i
know
- System resources ( CPU , net, disk , memory ) -- All normal
- 10G network for public and cluster network -- no saturation
- Add disks are physically healthy
- No messages in /var/log/messages OR dmesg
- Tried restarting OSD which are blocking operation , but no luck
- Tried writing through RBD and Rados bench , both are giving same problemm
Please help me to fix this problem.
# rados bench -p rbd 60 write
Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds or
0 objects
Object prefix: benchmark_data_stor1_1791844
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 125 109 435.873 436 0.022076 0.0697864
2 16 139 123 245.948 56 0.246578 0.0674407
3 16 139 123 163.969 0 - 0.0674407
4 16 139 123 122.978 0 - 0.0674407
5 16 139 123 98.383 0 - 0.0674407
6 16 139 123 81.9865 0 - 0.0674407
7 16 139 123 70.2747 0 - 0.0674407
8 16 139 123 61.4903 0 - 0.0674407
9 16 139 123 54.6582 0 - 0.0674407
10 16 139 123 49.1924 0 - 0.0674407
11 16 139 123 44.7201 0 - 0.0674407
12 16 139 123 40.9934 0 - 0.0674407
13 16 139 123 37.8401 0 - 0.0674407
14 16 139 123 35.1373 0 - 0.0674407
15 16 139 123 32.7949 0 - 0.0674407
16 16 139 123 30.7451 0 - 0.0674407
17 16 139 123 28.9364 0 - 0.0674407
18 16 139 123 27.3289 0 - 0.0674407
19 16 139 123 25.8905 0 - 0.0674407
2015-09-07 15:54:52.694071min lat: 0.022076 max lat: 0.46117 avg lat:
0.0674407
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 16 139 123 24.596 0 - 0.0674407
21 16 139 123 23.4247 0 - 0.0674407
22 16 139 123 22.36 0 - 0.0674407
23 16 139 123 21.3878 0 - 0.0674407
24 16 139 123 20.4966 0 - 0.0674407
25 16 139 123 19.6768 0 - 0.0674407
26 16 139 123 18.92 0 - 0.0674407
27 16 139 123 18.2192 0 - 0.0674407
28 16 139 123 17.5686 0 - 0.0674407
29 16 139 123 16.9628 0 - 0.0674407
30 16 139 123 16.3973 0 - 0.0674407
31 16 139 123 15.8684 0 - 0.0674407
32 16 139 123 15.3725 0 - 0.0674407
33 16 139 123 14.9067 0 - 0.0674407
34 16 139 123 14.4683 0 - 0.0674407
35 16 139 123 14.0549 0 - 0.0674407
36 16 139 123 13.6645 0 - 0.0674407
37 16 139 123 13.2952 0 - 0.0674407
38 16 139 123 12.9453 0 - 0.0674407
39 16 139 123 12.6134 0 - 0.0674407
2015-09-07 15:55:12.697124min lat: 0.022076 max lat: 0.46117 avg lat:
0.0674407
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
40 16 139 123 12.2981 0 - 0.0674407
41 16 139 123 11.9981 0 - 0.0674407
cluster 86edf8b8-b353-49f1-ab0a-a4827a9ea5e8
health HEALTH_WARN
1 requests are blocked > 32 sec
monmap e3: 3 mons at {stor0111=
10.100.1.111:6789/0,stor0113=10.100.1.113:6789/0,stor011
5=10.100.1.115:6789/0}
election epoch 32, quorum 0,1,2 stor0111,stor0113,stor0115
osdmap e19536: 50 osds: 50 up, 50 in
pgmap v928610: 2752 pgs, 9 pools, 30476 GB data, 4183 kobjects
91513 GB used, 47642 GB / 135 TB avail
2752 active+clean
Tried using RBD
# dd if=/dev/zero of=file1 bs=4K count=10000 oflag=direct
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 24.5529 s, 1.7 MB/s
# dd if=/dev/zero of=file1 bs=1M count=100 oflag=direct
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.05602 s, 9.3 MB/s
# dd if=/dev/zero of=file1 bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 293.551 s, 3.7 MB/s
]#
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com