I tested the dirty_ratio and dirty_background_ratio setting on ceph storage
server side.
It didn’t work.
The problem reproduced just now.
Some process stay in D state for over 10 minitues.
These osd requests is about to take several minutes.
Here are some logs:
[Thu Dec 6 17:59:18 2018] INFO: task java:11477 blocked for more than 120
seconds.
[Thu Dec 6 17:59:18 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[Thu Dec 6 17:59:18 2018] java D ffffffff816a4180 0 11477
8391 0x00000180
[Thu Dec 6 17:59:18 2018] ffff8807942bbbd0 0000000000000086 ffff8801f7e79fa0
ffff8807942bbfd8
[Thu Dec 6 17:59:18 2018] ffff8807942bbfd8 ffff8807942bbfd8 ffff8801f7e79fa0
ffff88086de56cc0
[Thu Dec 6 17:59:18 2018] 0000000000000000 7fffffffffffffff ffff88047ffa04e8
ffffffff816a4180
[Thu Dec 6 17:59:18 2018] Call Trace:
[Thu Dec 6 17:59:18 2018] [<ffffffff816a4180>] ? bit_wait+0x50/0x50
[Thu Dec 6 17:59:18 2018] [<ffffffff816a6059>] schedule+0x29/0x70
[Thu Dec 6 17:59:18 2018] [<ffffffff816a3b69>] schedule_timeout+0x239/0x2c0
[Thu Dec 6 17:59:18 2018] [<ffffffff81190541>] ? pagevec_lookup_tag+0x21/0x30
[Thu Dec 6 17:59:18 2018] [<ffffffffc081f598>] ?
ceph_writepages_start+0x1298/0x1520 [ceph]
[Thu Dec 6 17:59:18 2018] [<ffffffff810e939c>] ? ktime_get_ts64+0x4c/0xf0
[Thu Dec 6 17:59:18 2018] [<ffffffff816a4180>] ? bit_wait+0x50/0x50
[Thu Dec 6 17:59:18 2018] [<ffffffff816a56dd>] io_schedule_timeout+0xad/0x130
[Thu Dec 6 17:59:18 2018] [<ffffffff816a5778>] io_schedule+0x18/0x20
[Thu Dec 6 17:59:18 2018] [<ffffffff816a4191>] bit_wait_io+0x11/0x50
[Thu Dec 6 17:59:18 2018] [<ffffffff816a3cb5>] __wait_on_bit+0x65/0x90
[Thu Dec 6 17:59:18 2018] [<ffffffff81181cb1>] wait_on_page_bit+0x81/0xa0
[Thu Dec 6 17:59:18 2018] [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40
[Thu Dec 6 17:59:18 2018] [<ffffffff81181de1>]
__filemap_fdatawait_range+0x111/0x190
[Thu Dec 6 17:59:18 2018] [<ffffffff8118f42e>] ? do_writepages+0x1e/0x40
[Thu Dec 6 17:59:18 2018] [<ffffffff81181e74>]
filemap_fdatawait_range+0x14/0x30
[Thu Dec 6 17:59:18 2018] [<ffffffff81183db6>]
filemap_write_and_wait_range+0x56/0x90
[Thu Dec 6 17:59:18 2018] [<ffffffffc082726c>] ceph_fsync+0x6c/0x510 [ceph]
[Thu Dec 6 17:59:18 2018] [<ffffffff81137273>] ? __secure_computing+0x73/0x240
[Thu Dec 6 17:59:18 2018] [<ffffffff8111f5c6>] ?
__audit_syscall_exit+0x1e6/0x280
[Thu Dec 6 17:59:18 2018] [<ffffffff81231d85>] do_fsync+0x65/0xa0
[Thu Dec 6 17:59:18 2018] [<ffffffff81232083>] SyS_fdatasync+0x13/0x20
[Thu Dec 6 17:59:18 2018] [<ffffffff816b1d4e>] tracesys+0xd9/0xde
[Thu Dec 6 18:03:38 2018] libceph: osd3 10.0.30.111:6808 socket closed (con
state OPEN)
[Thu Dec 6 18:03:39 2018] libceph: osd7 10.0.30.112:6807 socket closed (con
state OPEN)
[Thu Dec 6 18:03:39 2018] libceph: osd6 10.0.30.112:6805 socket closed (con
state OPEN)
cephfs kernel client side osdc:
REQUESTS 1548 homeless 0
124188 osd5 1.7e6c3097 [5,2]/5 [5,2]/5 100004474e2.000006bf
0x400024 1 0'0 write
124191 osd5 1.37d4eeac [5,2]/5 [5,2]/5 100004474e2.000006c2
0x400024 1 0'0 write
124192 osd5 1.2506d82d [5,1]/5 [5,1]/5 100004474e2.000006c3
0x400024 1 0'0 write
124198 osd5 1.5326c49c [5,3]/5 [5,3]/5 100004474e1.00000888
0x400024 1 0'0 write
124219 osd5 1.37203f43 [5,2]/5 [5,2]/5 100004474e2.000006cb
0x400024 1 0'0 write
124236 osd5 1.1318aeb5 [5,0]/5 [5,0]/5 100002adc43.00000000
0x400024 1 0'0 write
124268 osd5 1.b4368bad [5,0]/5 [5,0]/5 100004474e1.000008a2
0x400024 1 0'0 write
124272 osd5 1.a31398ad [5,0]/5 [5,0]/5 100004474e2.000006dc
0x400024 1 0'0 write
124279 osd5 1.f6363aff [5,0]/5 [5,0]/5 100004474e2.000006e3
0x400024 1 0'0 write
124281 osd5 1.e3386fca [5,2]/5 [5,2]/5 100004474e2.000006e5
0x400024 1 0'0 write
124288 osd5 1.a95ff0ac [5,2]/5 [5,2]/5 100004474e1.000008ab
0x400024 1 0'0 write
124290 osd5 1.4e9c48ff [5,0]/5 [5,0]/5 100004474e1.000008ad
0x400024 1 0'0 write
124293 osd5 1.e4721518 [5,2]/5 [5,2]/5 100004474e1.000008b0
0x400024 1 0'0 write
124299 osd5 1.fae2c501 [5,0]/5 [5,0]/5 100004474e2.000006ec
0x400024 1 0'0 write
124303 osd5 1.708289ad [5,0]/5 [5,0]/5 100004474e2.000006f0
0x400024 1 0'0 write
124312 osd5 1.85e93586 [5,0]/5 [5,0]/5 100004474e1.000008b8
0x400024 1 0'0 write
124314 osd5 1.85146f18 [5,2]/5 [5,2]/5 100004474e1.000008ba
0x400024 1 0'0 write
124316 osd5 1.b4c5875a [5,0]/5 [5,0]/5 100004474e2.000006f2
0x400024 1 0'0 write
124336 osd5 1.cc35e38e [5,2]/5 [5,2]/5 100004474e1.000008c5
0x400024 1 0'0 write
124339 osd5 1.c99de197 [5,2]/5 [5,2]/5 100004474e2.000006fe
0x400024 1 0'0 write
124364 osd5 1.21ea6cb7 [5,2]/5 [5,2]/5 100004474e2.0000070c
0x400024 1 0'0 write
124368 osd5 1.411148d8 [5,1]/5 [5,1]/5 100004474e2.00000710
0x400024 1 0'0 write
124370 osd5 1.c56129da [5,1]/5 [5,1]/5 100004474e2.00000712
0x400024 1 0'0 write
124380 osd5 1.dd7b1c12 [5,1]/5 [5,1]/5 100004474e1.000008db
0x400024 1 0'0 write
124384 osd5 1.1878e328 [5,2]/5 [5,2]/5 100004474e2.00000715
0x400024 1 0'0 write
124390 osd5 1.f992feda [5,1]/5 [5,1]/5 100004474e2.0000071b
0x400024 1 0'0 write
124396 osd5 1.84f2e0ff [5,0]/5 [5,0]/5 100004474e1.000008e0
0x400024 1 0'0 write
124398 osd5 1.76e9cbb5 [5,0]/5 [5,0]/5 100004474e1.000008e2
0x400024 1 0'0 write
124405 osd5 1.3a4c5001 [5,0]/5 [5,0]/5 100004474e2.0000071f
0x400024 1 0'0 write
124413 osd5 1.91f47ada [5,1]/5 [5,1]/5 100004474e2.00000727
0x400024 1 0'0 write
124420 osd5 1.da08fa82 [5,1]/5 [5,1]/5 100004474e1.000008ed
0x400024 1 0'0 write
124422 osd5 1.f0dc5efd [5,3]/5 [5,3]/5 100004474e1.000008ef
0x400024 1 0'0 write
124435 osd5 1.28ba590a [5,2]/5 [5,2]/5 100004474e2.00000732
0x400024 1 0'0 write
124442 osd5 1.21439cf [5,1]/5 [5,1]/5 100004474e1.000008f8
0x400024 1 0'0 write
ceph server side dumps:
# ceph daemon osd.5 dump_blocked_ops
{
"ops": [],
"complaint_time": 30.000000,
"num_blocked_ops": 0
}
# ceph daemon osd.5 dump_ops_in_flight
{
"ops": [],
"num_ops": 0
}
# ceph daemon osd.5 dump_historic_slow_ops
{
"num to keep": 20,
"threshold to keep": 10,
"Ops": [
{
"description": "osd_op(client.447197.1:71890 1.b7884c19
1:983211ed:::100004342f9.000015e6:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.573307",
"age": 649.893033,
"duration": 13.827240,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 71890
},
"events": [
{
"time": "2018-12-06 18:04:10.573307",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.573307",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.573309",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:10.595225",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:10.595278",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:10.595291",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:22.970164",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:22.970227",
"event": "started"
},
{
"time": "2018-12-06 18:04:22.970273",
"event": "waiting for subops from 6"
},
{
"time": "2018-12-06 18:04:23.323234",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:24.400524",
"event": "sub_op_commit_rec from 6"
},
{
"time": "2018-12-06 18:04:24.400541",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:24.400547",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:71905 1.b5503364
1:26cc0aad:::100004342fb.00000c45:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.620624",
"age": 649.845717,
"duration": 12.702676,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 71905
},
"events": [
{
"time": "2018-12-06 18:04:10.620624",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.620624",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.620626",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:10.635423",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:10.635476",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:10.635485",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.057810",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.057896",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.057945",
"event": "waiting for subops from 7"
},
{
"time": "2018-12-06 18:04:23.237036",
"event": "sub_op_commit_rec from 7"
},
{
"time": "2018-12-06 18:04:23.323250",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:23.323285",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:23.323300",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:71913 1.9b59c75f
1:fae39ad9:::100004342fb.00000c4d:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.649445",
"age": 649.816895,
"duration": 12.762881,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 71913
},
"events": [
{
"time": "2018-12-06 18:04:10.649445",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.649445",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.649447",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:10.667713",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:10.667783",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:10.667803",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.059423",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.059494",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.059540",
"event": "waiting for subops from 5"
},
{
"time": "2018-12-06 18:04:23.147873",
"event": "sub_op_commit_rec from 5"
},
{
"time": "2018-12-06 18:04:23.412282",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:23.412312",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:23.412326",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:71924 1.43f2a896
1:69154fc2:::100004342f9.000015f9:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.680689",
"age": 649.785651,
"duration": 12.731666,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 71924
},
"events": [
{
"time": "2018-12-06 18:04:10.680689",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.680689",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.680692",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:10.702583",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:10.702627",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:10.702634",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.147879",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.147979",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.148065",
"event": "waiting for subops from 4"
},
{
"time": "2018-12-06 18:04:23.237049",
"event": "sub_op_commit_rec from 4"
},
{
"time": "2018-12-06 18:04:23.412333",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:23.412349",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:23.412356",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:71935 1.df579f5f
1:faf9eafb:::100004342fb.00000c54:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.751790",
"age": 649.714550,
"duration": 12.747687,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 71935
},
"events": [
{
"time": "2018-12-06 18:04:10.751790",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.751790",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.751792",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:10.764998",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:10.765049",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:10.765059",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.149915",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.149990",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.150037",
"event": "waiting for subops from 5"
},
{
"time": "2018-12-06 18:04:23.237059",
"event": "sub_op_commit_rec from 5"
},
{
"time": "2018-12-06 18:04:23.499434",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:23.499466",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:23.499477",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:71948 1.30315755
1:aaea8c0c:::100004342f9.00001602:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.765065",
"age": 649.701276,
"duration": 12.734734,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 71948
},
"events": [
{
"time": "2018-12-06 18:04:10.765065",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.765065",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.765067",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:10.789073",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:10.789125",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:10.789136",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.237064",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.237142",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.237187",
"event": "waiting for subops from 5"
},
{
"time": "2018-12-06 18:04:23.323299",
"event": "sub_op_commit_rec from 5"
},
{
"time": "2018-12-06 18:04:23.499499",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:23.499513",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:23.499799",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:71952 1.e0451baf
1:f5d8a207:::100004342f9.00001606:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.789142",
"age": 649.677198,
"duration": 12.809745,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 71952
},
"events": [
{
"time": "2018-12-06 18:04:10.789142",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.789142",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.789145",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:10.802180",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:10.802240",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:10.802249",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.238710",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.238776",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.238890",
"event": "waiting for subops from 7"
},
{
"time": "2018-12-06 18:04:23.323284",
"event": "sub_op_commit_rec from 7"
},
{
"time": "2018-12-06 18:04:23.598857",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:23.598878",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:23.598888",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:71987 1.87226b4b
1:d2d644e1:::100004342f9.0000161a:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.875449",
"age": 649.590892,
"duration": 12.723466,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 71987
},
"events": [
{
"time": "2018-12-06 18:04:10.875449",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.875449",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.875450",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:10.889074",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:10.889109",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:10.889113",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.323306",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.323402",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.323465",
"event": "waiting for subops from 4"
},
{
"time": "2018-12-06 18:04:23.412387",
"event": "sub_op_commit_rec from 4"
},
{
"time": "2018-12-06 18:04:23.598895",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:23.598908",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:23.598915",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72009 1.6e26a0cd
1:b3056476:::100004342f9.00001621:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.951671",
"age": 649.514669,
"duration": 13.559266,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72009
},
"events": [
{
"time": "2018-12-06 18:04:10.951671",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.951671",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.951673",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:10.971212",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:10.971265",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:10.971274",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.325006",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.325073",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.325120",
"event": "waiting for subops from 6"
},
{
"time": "2018-12-06 18:04:23.689393",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:24.510912",
"event": "sub_op_commit_rec from 6"
},
{
"time": "2018-12-06 18:04:24.510930",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:24.510937",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72027 1.2675011e
1:7880ae64:::100004342fb.00000c83:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.983790",
"age": 649.482551,
"duration": 13.631767,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72027
},
"events": [
{
"time": "2018-12-06 18:04:10.983790",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.983790",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.983791",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:10.997441",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:10.997486",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:10.997493",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.412425",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.412492",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.412539",
"event": "waiting for subops from 6"
},
{
"time": "2018-12-06 18:04:23.782075",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:24.615518",
"event": "sub_op_commit_rec from 6"
},
{
"time": "2018-12-06 18:04:24.615545",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:24.615556",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72047 1.20d53eaf
1:f57cab04:::100004342f9.00001638:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:10.997499",
"age": 649.468842,
"duration": 12.894479,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72047
},
"events": [
{
"time": "2018-12-06 18:04:10.997499",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:10.997499",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:10.997500",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:11.010879",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:11.010920",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:11.010925",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.499523",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.499599",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.499652",
"event": "waiting for subops from 7"
},
{
"time": "2018-12-06 18:04:23.598911",
"event": "sub_op_commit_rec from 7"
},
{
"time": "2018-12-06 18:04:23.891906",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:23.891960",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:23.891977",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72114 1.5c7de55
1:aa7be3a0:::100004342fb.00000cad:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:11.111490",
"age": 649.354851,
"duration": 12.780517,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72114
},
"events": [
{
"time": "2018-12-06 18:04:11.111490",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:11.111490",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:11.111492",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:11.127195",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:11.127236",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:11.127242",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.598920",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.598999",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.599048",
"event": "waiting for subops from 5"
},
{
"time": "2018-12-06 18:04:23.688752",
"event": "sub_op_commit_rec from 5"
},
{
"time": "2018-12-06 18:04:23.891986",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:23.892001",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:23.892007",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72124 1.59941c4b
1:d238299a:::100004342fb.00000cb7:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:11.127248",
"age": 649.339093,
"duration": 12.962263,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72124
},
"events": [
{
"time": "2018-12-06 18:04:11.127248",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:11.127248",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:11.127250",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:11.137801",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:11.137837",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:11.137842",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.600686",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.600750",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.600795",
"event": "waiting for subops from 4"
},
{
"time": "2018-12-06 18:04:23.688740",
"event": "sub_op_commit_rec from 4"
},
{
"time": "2018-12-06 18:04:24.089450",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:24.089493",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:24.089511",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72136 1.4c3df65f
1:fa6fbc32:::100004342f9.00001664:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:11.161272",
"age": 649.305069,
"duration": 12.928269,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72136
},
"events": [
{
"time": "2018-12-06 18:04:11.161272",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:11.161272",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:11.161273",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:11.176621",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:11.176665",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:11.176672",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.688809",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.688878",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.688924",
"event": "waiting for subops from 5"
},
{
"time": "2018-12-06 18:04:23.781472",
"event": "sub_op_commit_rec from 5"
},
{
"time": "2018-12-06 18:04:24.089520",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:24.089534",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:24.089541",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72202 1.de800dcd
1:b3b0017b:::100004342fb.00000cd8:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:11.235424",
"age": 649.230917,
"duration": 13.494299,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72202
},
"events": [
{
"time": "2018-12-06 18:04:11.235424",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:11.235424",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:11.235425",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:11.244364",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:11.244400",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:11.244406",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.690409",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.690473",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.690519",
"event": "waiting for subops from 6"
},
{
"time": "2018-12-06 18:04:24.175400",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:24.729691",
"event": "sub_op_commit_rec from 6"
},
{
"time": "2018-12-06 18:04:24.729714",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:24.729723",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72222 1.bc0cdac3
1:c35b303d:::100004342f9.00001694:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:11.283182",
"age": 649.183159,
"duration": 12.980151,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72222
},
"events": [
{
"time": "2018-12-06 18:04:11.283182",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:11.283182",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:11.283183",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:11.290043",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:11.290087",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:11.290095",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.781478",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.781545",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.781591",
"event": "waiting for subops from 7"
},
{
"time": "2018-12-06 18:04:23.892055",
"event": "sub_op_commit_rec from 7"
},
{
"time": "2018-12-06 18:04:24.263304",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:24.263325",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:24.263333",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72237 1.567a0855
1:aa105e6a:::100004342f9.000016a3:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:11.305424",
"age": 649.160916,
"duration": 13.052176,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72237
},
"events": [
{
"time": "2018-12-06 18:04:11.305424",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:11.305424",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:11.305426",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:11.312298",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:11.312335",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:11.312340",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:23.892064",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:23.892156",
"event": "started"
},
{
"time": "2018-12-06 18:04:23.892216",
"event": "waiting for subops from 5"
},
{
"time": "2018-12-06 18:04:24.089563",
"event": "sub_op_commit_rec from 5"
},
{
"time": "2018-12-06 18:04:24.357583",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:24.357594",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:24.357600",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72249 1.960d9955
1:aa99b069:::100004342f9.000016af:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:11.319524",
"age": 649.146817,
"duration": 13.038095,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72249
},
"events": [
{
"time": "2018-12-06 18:04:11.319524",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:11.319524",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:11.319525",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:11.327928",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:11.327967",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:11.327973",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:24.089569",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:24.089666",
"event": "started"
},
{
"time": "2018-12-06 18:04:24.089730",
"event": "waiting for subops from 5"
},
{
"time": "2018-12-06 18:04:24.175469",
"event": "sub_op_commit_rec from 5"
},
{
"time": "2018-12-06 18:04:24.357605",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:24.357614",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:24.357618",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:72256 1.38b4facd
1:b35f2d1c:::100004342f9.000016b6:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:04:11.334694",
"age": 649.131646,
"duration": 13.615766,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 72256
},
"events": [
{
"time": "2018-12-06 18:04:11.334694",
"event": "initiated"
},
{
"time": "2018-12-06 18:04:11.334694",
"event": "header_read"
},
{
"time": "2018-12-06 18:04:11.334696",
"event": "throttled"
},
{
"time": "2018-12-06 18:04:11.340039",
"event": "all_read"
},
{
"time": "2018-12-06 18:04:11.340076",
"event": "dispatched"
},
{
"time": "2018-12-06 18:04:11.340082",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:04:24.091372",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:04:24.091440",
"event": "started"
},
{
"time": "2018-12-06 18:04:24.091490",
"event": "waiting for subops from 6"
},
{
"time": "2018-12-06 18:04:24.359630",
"event": "op_commit"
},
{
"time": "2018-12-06 18:04:24.950426",
"event": "sub_op_commit_rec from 6"
},
{
"time": "2018-12-06 18:04:24.950451",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:04:24.950460",
"event": "done"
}
]
}
},
{
"description": "osd_op(client.447197.1:75635 1.751883c0
1:03c118ae:::100004342f9.00001a19:head [write 0~4194304] snapc 1=[]
ondisk+write+known_if_redirected e75)",
"initiated_at": "2018-12-06 18:05:16.411107",
"age": 584.055234,
"duration": 10.290467,
"type_data": {
"flag_point": "commit sent; apply or cleanup",
"client_info": {
"client": "client.447197",
"client_addr": "10.0.30.110:0/3181790720",
"tid": 75635
},
"events": [
{
"time": "2018-12-06 18:05:16.411107",
"event": "initiated"
},
{
"time": "2018-12-06 18:05:16.411107",
"event": "header_read"
},
{
"time": "2018-12-06 18:05:16.411108",
"event": "throttled"
},
{
"time": "2018-12-06 18:05:16.416864",
"event": "all_read"
},
{
"time": "2018-12-06 18:05:16.416903",
"event": "dispatched"
},
{
"time": "2018-12-06 18:05:16.416909",
"event": "queued_for_pg"
},
{
"time": "2018-12-06 18:05:19.383296",
"event": "reached_pg"
},
{
"time": "2018-12-06 18:05:19.383394",
"event": "started"
},
{
"time": "2018-12-06 18:05:19.383443",
"event": "waiting for subops from 5"
},
{
"time": "2018-12-06 18:05:19.957547",
"event": "op_commit"
},
{
"time": "2018-12-06 18:05:26.701524",
"event": "sub_op_commit_rec from 5"
},
{
"time": "2018-12-06 18:05:26.701557",
"event": "commit_sent"
},
{
"time": "2018-12-06 18:05:26.701574",
"event": "done"
}
]
}
}
]
}
—
Best Regards
Li, Ning
> On Dec 6, 2018, at 14:31, Yan, Zheng <[email protected]> wrote:
>
> On Thu, Dec 6, 2018 at 2:24 PM Li,Ning <[email protected]> wrote:
>>
>> Hi,
>>
>> Overload on the storage nodes that run OSDs is normal during the whole copy.
>>
>> The memory pressure is on the client side. I think it may not work to turn
>> the dirty_ratio and dirty_background_ratio to small on the ceph storage
>> server side.
>>
>>
>> Anyway,I will have a try.
>>
>> —
>> Best Regards
>> Li, Ning
>>
>>
>>
>>> On Dec 6, 2018, at 11:41, Yan, Zheng <[email protected]> wrote:
>>>
>>> On Wed, Dec 5, 2018 at 2:33 PM NingLi <[email protected]> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> We found that some process writing cephfs will hang for a long time (>
>>>> 120s) when uploading(scp/rsync) large files(totally 50G ~ 300G)to the app
>>>> node's cephfs mountpoint.
>>>>
>>>> This problem is not always reproduciable. But when this problem occurs,
>>>> the web(nginx) or some other services can’t be access, it’s embarrassing.
>>>>
>>>>
>>>> Please give some tips.
>>>>
>>>>
>>>> We are using ceph version 13.2.1 and 3.10.0-693.el7.x86_64 kernel.
>>>>
>>>>
>>>> When this problem occurs, the memory pressure is a bit high.
>>>> # free -h
>>>> total used free shared buff/cache
>>>> available
>>>> Mem: 30G 11G 230M 3.3M 18G 18G
>>>> Swap: 0B 0B 0B
>>>>
>>>> # mpstat -P ALL
>>>> Average: CPU %usr %nice %sys %iowait %irq %soft %steal
>>>> %guest %gnice %idle
>>>> Average: all 3.80 0.04 1.46 55.12 0.00 0.08
>>>> 0.00 0.00 0.00 39.49
>>>> Average: 0 1.52 0.00 0.51 97.98 0.00 0.00
>>>> 0.00 0.00 0.00 0.00
>>>> Average: 1 13.07 0.00 1.51 0.00 0.00 0.00
>>>> 0.00 0.00 0.00 85.43
>>>> Average: 2 3.00 0.00 3.00 0.50 0.00 0.00
>>>> 0.00 0.00 0.00 93.50
>>>> Average: 3 2.50 0.00 2.00 0.00 0.00 0.00
>>>> 0.00 0.00 0.00 95.50
>>>> Average: 4 1.50 0.50 1.50 96.00 0.00 0.50
>>>> 0.00 0.00 0.00 0.00
>>>> Average: 5 0.50 0.00 1.01 32.16 0.00 0.00
>>>> 0.00 0.00 0.00 66.33
>>>> Average: 6 2.49 0.00 2.49 0.00 0.00 0.50
>>>> 0.00 0.00 0.00 94.53
>>>> Average: 7 1.52 0.00 0.51 97.98 0.00 0.00
>>>> 0.00 0.00 0.00 0.00
>>>> Average: 8 1.50 0.00 1.50 97.00 0.00 0.00
>>>> 0.00 0.00 0.00 0.00
>>>> Average: 9 14.07 0.00 2.01 46.23 0.00 0.00
>>>> 0.00 0.00 0.00 37.69
>>>> Average: 10 2.00 0.00 0.50 97.50 0.00 0.00
>>>> 0.00 0.00 0.00 0.00
>>>> Average: 11 2.01 0.00 1.01 96.98 0.00 0.00
>>>> 0.00 0.00 0.00 0.00
>>>>
>>>> We can see some process in state ‘D’ for a long time. here are 2
>>>> fragments in dmesg.
>>>>
>>>
>>> try setting vm.dirty_ration and vm.dirty_background_ration of machines
>>> that run OSDs to smaller value.
>>>
>>>> [Fri Nov 30 15:10:43 2018] INFO: task java:11662 blocked for more than 120
>>>> seconds.
>>>> [Fri Nov 30 15:10:43 2018] "echo 0 >
>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> [Fri Nov 30 15:10:43 2018] java D ffffffff816a4180 0 11662
>>>> 10409 0x00000180
>>>> [Fri Nov 30 15:10:43 2018] ffff8805f4febbd0 0000000000000082
>>>> ffff8807e2edcf10 ffff8805f4febfd8
>>>> [Fri Nov 30 15:10:43 2018] ffff8805f4febfd8 ffff8805f4febfd8
>>>> ffff8807e2edcf10 ffff88086df56cc0
>>>> [Fri Nov 30 15:10:43 2018] 0000000000000000 7fffffffffffffff
>>>> ffff88087ffc3be8 ffffffff816a4180
>>>> [Fri Nov 30 15:10:43 2018] Call Trace:
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a6059>] schedule+0x29/0x70
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a3b69>]
>>>> schedule_timeout+0x239/0x2c0
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff81190541>] ?
>>>> pagevec_lookup_tag+0x21/0x30
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffffc082b598>] ?
>>>> ceph_writepages_start+0x1298/0x1520 [ceph]
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff810e939c>] ? ktime_get_ts64+0x4c/0xf0
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a56dd>]
>>>> io_schedule_timeout+0xad/0x130
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a5778>] io_schedule+0x18/0x20
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a4191>] bit_wait_io+0x11/0x50
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff816a3cb5>] __wait_on_bit+0x65/0x90
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff81181cb1>] wait_on_page_bit+0x81/0xa0
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff810b19d0>] ?
>>>> wake_bit_function+0x40/0x40
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff81181de1>]
>>>> __filemap_fdatawait_range+0x111/0x190
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff811b0d79>] ? do_numa_page+0x159/0x1e0
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff8118f42e>] ? do_writepages+0x1e/0x40
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff81181e74>]
>>>> filemap_fdatawait_range+0x14/0x30
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff81183db6>]
>>>> filemap_write_and_wait_range+0x56/0x90
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffffc083326c>] ceph_fsync+0x6c/0x510
>>>> [ceph]
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff81137273>] ?
>>>> __secure_computing+0x73/0x240
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff8111f5c6>] ?
>>>> __audit_syscall_exit+0x1e6/0x280
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff81231d85>] do_fsync+0x65/0xa0
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff81232083>] SyS_fdatasync+0x13/0x20
>>>> [Fri Nov 30 15:10:43 2018] [<ffffffff816b1d4e>] tracesys+0xd9/0xde
>>>>
>>>>
>>>> [Fri Nov 30 15:12:43 2018] INFO: task prometheus:9919 blocked for more
>>>> than 120 seconds.
>>>> [Fri Nov 30 15:12:43 2018] "echo 0 >
>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> [Fri Nov 30 15:12:43 2018] prometheus D ffff8807e87dbad8 0 9919
>>>> 9879 0x00000080
>>>> [Fri Nov 30 15:12:43 2018] ffff8807e87db970 0000000000000082
>>>> ffff8803e273cf10 ffff8807e87dbfd8
>>>> [Fri Nov 30 15:12:43 2018] ffff8807e87dbfd8 ffff8807e87dbfd8
>>>> ffff8803e273cf10 ffff88046e716cc0
>>>> [Fri Nov 30 15:12:43 2018] 0000000000000000 7fffffffffffffff
>>>> ffffffff816a4180 ffff8807e87dbad8
>>>> [Fri Nov 30 15:12:43 2018] Call Trace:
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a6059>] schedule+0x29/0x70
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a3b69>]
>>>> schedule_timeout+0x239/0x2c0
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff81090007>] ?
>>>> local_bh_enable+0x17/0x20
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a4180>] ? bit_wait+0x50/0x50
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a56dd>]
>>>> io_schedule_timeout+0xad/0x130
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a5778>] io_schedule+0x18/0x20
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a4191>] bit_wait_io+0x11/0x50
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff816a3eaf>]
>>>> __wait_on_bit_lock+0x5f/0xc0
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff81182094>] __lock_page+0x74/0x90
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff810b19d0>] ?
>>>> wake_bit_function+0x40/0x40
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff81182a34>] __find_lock_page+0x54/0x70
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff811836e2>]
>>>> grab_cache_page_write_begin+0x62/0xd0
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffffc082a203>]
>>>> ceph_write_begin+0x43/0xe0 [ceph]
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff811824ce>]
>>>> generic_file_buffered_write+0x11e/0x2a0
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffffc08259c4>]
>>>> ceph_aio_write+0xac4/0xc60 [ceph]
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff8160417a>] ? inet_recvmsg+0x7a/0xa0
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff811feb3d>] do_sync_write+0x8d/0xd0
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff811ff5fd>] vfs_write+0xbd/0x1e0
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff8120040f>] SyS_write+0x7f/0xe0
>>>> [Fri Nov 30 15:12:43 2018] [<ffffffff816b1b49>]
>>>> system_call_fastpath+0x16/0x1b
>>>>
>>>>
>>>> But the cephfs status shows healthy.
>>>> # ceph -s
>>>> cluster:
>>>> id: b27bc9a3-2e32-4782-b60f-b7901b0b26e5
>>>> health: HEALTH_OK
>>>>
>>>> services:
>>>> mon: 2 daemons, quorum skyaxe-storage-0,skyaxe-storage-1
>>>> mgr: skyaxe-storage-0(active), standbys: skyaxe-storage-1
>>>> mds: cephfs-1/1/1 up {0=skyaxe-storage-1=up:active}, 1 up:standby
>>>> osd: 8 osds: 8 up, 8 in
>>>>
>>>> data:
>>>> pools: 2 pools, 512 pgs
>>>> objects: 599.7 k objects, 876 GiB
>>>> usage: 2.6 TiB used, 13 TiB / 15 TiB avail
>>>> pgs: 512 active+clean
>>>>
>>>> io:
>>>> client: 3.7 KiB/s wr, 0 op/s rd, 1 op/s wr
>>>>
>>>>
>>>> We can see some unfinished requests in osdc.
>>>>
>>>> # cat /sys/kernel/debug/ceph/xxx/osdc
>>>> REQUESTS 98 homeless 0
>>>> 8529 osd1 1.1bfdee5 [1,7]/1 [1,7]/1 100000011be.00000000
>>>> 0x400024 1 0'0 write
>>>> 8532 osd1 1.2d81c338 [1,6]/1 [1,6]/1 100000011c0.00000000
>>>> 0x400024 1 0'0 write
>>>> 8536 osd1 1.49324f26 [1,4]/1 [1,4]/1 100000011c4.00000000
>>>> 0x400024 1 0'0 write
>>>> 8542 osd1 1.1403106f [1,7]/1 [1,7]/1 100000011c8.00000000
>>>> 0x400024 1 0'0 write
>>>> 8549 osd1 1.ea273113 [1,4]/1 [1,4]/1 100000011ce.00000000
>>>> 0x400024 1 0'0 write
>>>> 8562 osd1 1.f2d68e9a [1,5]/1 [1,5]/1 100000011d8.00000000
>>>> 0x400024 1 0'0 write
>>>> 8566 osd1 1.1f212283 [1,4]/1 [1,4]/1 100000011db.00000000
>>>> 0x400024 1 0'0 write
>>>> 8568 osd1 1.2e1984e7 [1,5]/1 [1,5]/1 100000011de.00000000
>>>> 0x400024 1 0'0 write
>>>> 8576 osd1 1.da7e63d0 [1,5]/1 [1,5]/1 100000011e1.00000000
>>>> 0x400024 1 0'0 write
>>>> 8581 osd1 1.3b0d280 [1,4]/1 [1,4]/1 100000011e8.00000000
>>>> 0x400024 1 0'0 write
>>>> 8582 osd1 1.80227f0b [1,5]/1 [1,5]/1 100000011ea.00000000
>>>> 0x400024 1 0'0 write
>>>> 8593 osd1 1.1efd9391 [1,6]/1 [1,6]/1 100000011ef.00000000
>>>> 0x400024 1 0'0 write
>>>> 8594 osd1 1.3db6e880 [1,4]/1 [1,4]/1 100000011f0.00000000
>>>> 0x400024 1 0'0 write
>>>> 8611 osd1 1.6d20c41d [1,4]/1 [1,4]/1 1000000028c.00000000
>>>> 0x400024 1 0'0 write
>>>> 8612 osd1 1.6d20c41d [1,4]/1 [1,4]/1 1000000028c.00000000
>>>> 0x400024 1 0'0 write
>>>> 8616 osd1 1.f4d90a4d [1,7]/1 [1,7]/1 10000001206.00000000
>>>> 0x400024 1 0'0 write
>>>> 8619 osd1 1.740514fe [1,5]/1 [1,5]/1 10000001208.00000000
>>>> 0x400024 1 0'0 write
>>>> 8630 osd1 1.d5fbabe0 [1,6]/1 [1,6]/1 10000001216.00000000
>>>> 0x400024 1 0'0 write
>>>> 8631 osd1 1.daf28f83 [1,4]/1 [1,4]/1 10000001217.00000000
>>>> 0x400024 1 0'0 write
>>>> 8635 osd1 1.d29e01a6 [1,5]/1 [1,5]/1 1000000121a.00000000
>>>> 0x400024 1 0'0 write
>>>> 8640 osd1 1.2de0ff63 [1,7]/1 [1,7]/1 10000001220.00000000
>>>> 0x400024 1 0'0 write
>>>> 8658 osd1 1.2f91206f [1,7]/1 [1,7]/1 1000000122c.00000000
>>>> 0x400064 1 0'0 write,startsync
>>>> ...
>>>>
>
> how long did these osd requests take? If some requests take too long,
> use 'ceph daemon osd.x dump dump_blocked_ops/dump_ops_in_flight' to
> check
>
>
>>>
>>>> Here is our ceph config:
>>>>
>>>> # cat /etc/ceph/ceph.conf
>>>> [global]
>>>> fsid = b27bc9a3-2e32-4782-b60f-b7901b0b26e5
>>>> mon_initial_members = skyaxe-storage-0, skyaxe-storage-1
>>>> mon_host = 10.0.30.111,10.0.30.112
>>>> auth_cluster_required = cephx
>>>> auth_service_required = cephx
>>>> auth_client_required = cephx
>>>>
>>>> osd_pool_default_size = 2
>>>> public_network = 10.0.30.0/24
>>>> cluster_network = 10.0.40.0/24
>>>> osd_pool_default_min_size = 1
>>>>
>>>> max_open_files = 131072
>>>>
>>>> [osd]
>>>> osd_journal_size = 56320
>>>>
>>>> journal_max_write_bytes = 1073741824
>>>> journal_max_write_entries = 10000
>>>> journal_queue_max_ops = 50000
>>>> journal_queue_max_bytes = 10737418240
>>>>
>>>> osd_mkfs_type = xfs
>>>> osd_mkfs_options_xfs = -f
>>>> osd_mount_options_xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
>>>>
>>>> osd_max_write_size = 512
>>>> osd_client_message_size_cap = 2147483648
>>>> osd_deep_scrub_stride = 131072
>>>> osd_op_threads = 8
>>>> osd_disk_threads = 4
>>>> osd_map_cache_size = 1024
>>>> osd_map_cache_bl_size = 128
>>>> osd_recovery_op_priority = 4
>>>> osd_recovery_max_active = 10
>>>> osd_max_backfills = 4
>>>>
>>>> —
>>>> Best Regards
>>>> Li, Ning
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com