Re: [ceph-users] slow requests are blocked

Paul Emmerich Wed, 16 May 2018 09:03:07 -0700

By looking at the operations that are slow in your dump_*_ops command.
We've found that it's best to move all the metadata stuff for RGW onto
SSDs, i.e., all pools except the actual data pool.


But that depends on your use case and whether the slow requests you are
seeing is actually a problem for you. Maybe you
don't care too much about the metadata operations.

Paul

2018-05-16 12:25 GMT+02:00 Grigory Murashov <[email protected]>:

> Hello Paul!
>
> Thanks for your answer.
>
> How did you understand it's RGW Metadata stuff?
>
> No, I don't use any SSDs. Where I can find out more about Metadata pools,
> using SSD etc?..
>
> Thanks.
>
> Grigory Murashov
> Voximplant
>
> 15.05.2018 23:42, Paul Emmerich пишет:
>
> Looks like it's mostly RGW metadata stuff; are you running your non-data
> RGW pools on SSDs (you should, that can help *a lot*)?
>
>
> Paul
>
> 2018-05-15 18:49 GMT+02:00 Grigory Murashov <[email protected]>:
>
>> Hello guys!
>>
>> I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph
>> daemon osd.16 dump_historic_ops.
>>
>> Here is the output of ceph heath details in the moment of problem
>>
>> HEALTH_WARN 20 slow requests are blocked > 32 sec
>> REQUEST_SLOW 20 slow requests are blocked > 32 sec
>>     20 ops are blocked > 65.536 sec
>>     osds 16,27,29 have blocked requests > 65.536 sec
>>
>> So I grab logs from osd.16.
>>
>> The file is attached.  Could you please help to translate?
>>
>> Thanks in advance.
>>
>> Grigory Murashov
>> Voximplant
>>
>> 14.05.2018 18:14, Grigory Murashov пишет:
>>
>> Hello David!
>>
>> 2. I set it up 10/10
>>
>> 3. Thanks, my problem was I did it on host where was no osd.15 daemon.
>>
>> Could you please help to read osd logs?
>>
>> Here is a part from ceph.log
>>
>> 2018-05-14 13:46:32.644323 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553895 : cluster [INF] Cluster is now healthy
>> 2018-05-14 13:46:43.741921 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553896 : cluster [WRN] Health check failed: 21 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:46:49.746994 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553897 : cluster [WRN] Health check update: 23 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:46:55.752314 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553900 : cluster [WRN] Health check update: 3 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:47:01.030686 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553901 : cluster [WRN] Health check update: 4 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:47:07.764236 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553903 : cluster [WRN] Health check update: 32 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:47:13.770833 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553904 : cluster [WRN] Health check update: 21 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>> 2018-05-14 13:47:17.774530 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553905 : cluster [INF] Health check cleared:
>> REQUEST_SLOW (was: 12 slow requests are blocked > 32 sec)
>> 2018-05-14 13:47:17.774582 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553906 : cluster [INF] Cluster is now healthy
>>
>> At 13-47 I had a problem with osd.21
>>
>> 1. Ceph Health (storage-ru1-osd1.voximplant.com:ceph.health): HEALTH_WARN
>> {u'REQUEST_SLOW': {u'severity': u'HEALTH_WARN', u'summary': {u'message': u'4 
>> slow requests are blocked > 32 sec'}}}
>> HEALTH_WARN 4 slow requests are blocked > 32 sec
>> REQUEST_SLOW 4 slow requests are blocked > 32 sec
>>     2 ops are blocked > 65.536 sec
>>     2 ops are blocked > 32.768 sec
>>     osd.21 has blocked requests > 65.536 sec
>>
>> Here is a part from ceph-osd.21.log
>>
>> 2018-05-14 13:47:06.891399 7fb806dd6700 10 osd.21 pg_epoch: 236 pg[2.0( v
>> 236'297 (0'0,236'297] local-lis/les=223/224 n=1 ec=119/119 lis/c 223/223
>> les/c/f 224/224/0 223/223/212) [21,29,15]
>> r=0 lpr=223 crt=236'297 lcod 236'296 mlcod 236'296 active+clean]
>> dropping ondisk_read_lock
>> 2018-05-14 13:47:06.891435 7fb806dd6700 10 osd.21 236 dequeue_op
>> 0x56453b753f80 finish
>> 2018-05-14 13:47:07.111388 7fb8185f9700 10 osd.21 236 tick
>> 2018-05-14 13:47:07.111398 7fb8185f9700 10 osd.21 236 do_waiters -- start
>> 2018-05-14 13:47:07.111401 7fb8185f9700 10 osd.21 236 do_waiters -- finish
>> 2018-05-14 13:47:07.800421 7fb817df8700 10 osd.21 236
>> tick_without_osd_lock
>> 2018-05-14 13:47:07.800444 7fb817df8700 10 osd.21 236
>> promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0  bytes;
>> target 25 obj/sec or 5120 k bytes/sec
>> 2018-05-14 13:47:07.800449 7fb817df8700 10 osd.21 236
>> promote_throttle_recalibrate  actual 0, actual/prob ratio 1, adjusted
>> new_prob 1000, prob 1000 -> 1000
>> 2018-05-14 13:47:08.111470 7fb8185f9700 10 osd.21 236 tick
>> 2018-05-14 13:47:08.111483 7fb8185f9700 10 osd.21 236 do_waiters -- start
>> 2018-05-14 13:47:08.111485 7fb8185f9700 10 osd.21 236 do_waiters -- finish
>> 2018-05-14 13:47:08.181070 7fb8055d3700 10 osd.21 236 dequeue_op
>> 0x564539651000 prio 63 cost 0 latency 0.000143
>> osd_op(client.2597258.0:213844298 6.1d4 6.4079fd4 (undecoded)
>> ondisk+read+kno
>> wn_if_redirected e236) v8 pg pg[6.1d4( v 236'20882 (236'19289,236'20882]
>> local-lis/les=223/224 n=20791 ec=145/132 lis/c 223/223 les/c/f 224/224/0
>> 223/223/212) [21,29,17] r=0 lpr=223 crt=236
>> '20882 lcod 236'20881 mlcod 236'20881 active+clean]
>> 2018-05-14 13:47:08.181112 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4(
>> v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132
>> lis/c 223/223 les/c/f 224/224/0 223/223/
>> 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881
>> active+clean] _handle_message: 0x564539651000
>> 2018-05-14 13:47:08.181141 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4(
>> v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132
>> lis/c 223/223 les/c/f 224/224/0 223/223/
>> 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881
>> active+clean] do_op osd_op(client.2597258.0:213844298 6.1d4
>> 6:2bf9e020:::eb359f44-3316-4cd3-9006-d416c21e0745.120446
>> 4.6_2018%2f05%2f14%2fYWRjNmZmNzQzODI2ZGQzOTc3ZjFiNGMxZjIxOGZ
>> lYzQvaHR0cDovL3d3dy1sdS0wMS0zNi52b3hpbXBsYW50LmNvbS9yZWNvcmR
>> zLzIwMTgvMDUvMTQvOTRlNjYxY2JiZjU3MTk4NS4xNTI2MjkwMzQ0Ljk2NjQ5MS5tcDM-
>> :head [getxattrs,stat,read 0~4194304] snapc 0=[]
>> ondisk+read+known_if_redirected e236) v8 may_read -> read-ordered flags
>> ondisk+read+known_if_redirected
>> 2018-05-14 13:47:08.181179 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4(
>> v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132
>> lis/c 223/223 les/c/f 224/224/0 223/223/
>> 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881
>> active+clean] get_object_context: found obc in cache: 0x5645b98fdb80
>> 2018-05-14 13:47:08.181193 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4(
>> v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132
>> lis/c 223/223 les/c/f 224/224/0 223/223/
>> 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881
>> active+clean] get_object_context: 0x5645b98fdb80
>> 6:2bf9e020:::eb359f44-3316-4cd3-9006-d416c21e0745.1204464.6_2018%2f
>> 05%2f14%2fYWRjNmZmNzQzODI2ZGQzOTc3ZjFiNGMxZjIxOGZlYzQvaHR0cD
>> ovL3d3dy1sdS0wMS0zNi52b3hpbXBsYW50LmNvbS9yZWNvcmRzLzIwMTgvMD
>> UvMTQvOTRlNjYxY2JiZjU3MTk4NS4xNTI2MjkwMzQ0Ljk2NjQ5MS5tcDM-:head rwsta
>> te(none n=0 w=0) oi: 6:2bf9e020:::eb359f44-3316-4cd
>> 3-9006-d416c21e0745.1204464.6_2018%2f05%2f14%2fYWRjNmZmNzQzO
>> DI2ZGQzOTc3ZjFiNGMxZjIxOGZlYzQvaHR0cDovL3d3dy1sdS0wMS0zNi52b
>> 3hpbXBsYW50LmNvbS9
>> yZWNvcmRzLzIwMTgvMDUvMTQvOTRlNjYxY2JiZjU3MTk4NS4xNTI2MjkwMzQ
>> 0Ljk2NjQ5MS5tcDM-:head(236'20762 client.2597258.0:212394860
>> dirty|data_digest|omap_digest s 708737 uv 20762 dd c68e9ed od fffffff
>> f alloc_hint [0 0 0]) exists: 1 ssc: 0x564542a6a640 snapset: 0=[]:{}
>> 2018-05-14 13:47:08.181222 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4(
>> v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132
>> lis/c 223/223 les/c/f 224/224/0 223/223/
>> 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881
>> active+clean] find_object_context 6:2bf9e020:::eb359f44-3316-4cd
>> 3-9006-d416c21e0745.1204464.6_2018%2f05%2f14%2fYWRjN
>> mZmNzQzODI2ZGQzOTc3ZjFiNGMxZjIxOGZlYzQvaHR0cDovL3d3dy1sdS0wM
>> S0zNi52b3hpbXBsYW50LmNvbS9yZWNvcmRzLzIwMTgvMDUvMTQvOTRlNjYxY
>> 2JiZjU3MTk4NS4xNTI2MjkwMzQ0Ljk2NjQ5MS5tcDM-:head @head oi=6:2bf9e020:
>> ::eb359f44-3316-4cd3-9006-d416c21e0745.1204464.6_2018%2f05%
>> 2f14%2fYWRjNmZmNzQzODI2ZGQzOTc3ZjFiNGMxZjIxOGZlYzQvaHR0cDovL
>> 3d3dy1sdS0wMS0zNi52b3hpbXBsYW50LmNvbS9yZWNvcmRzLzIwMTgvMDUvMTQvOTRlNjY
>> xY2JiZjU3MTk4NS4xNTI2MjkwMzQ0Ljk2NjQ5MS5tcDM-:head(236'20762
>> client.2597258.0:212394860 dirty|data_digest|omap_digest s 708737 uv 20762
>> dd c68e9ed od ffffffff alloc_hint [0 0 0])
>> 2018-05-14 13:47:08.181252 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4(
>> v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132
>> lis/c 223/223 les/c/f 224/224/0 223/223/
>> 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881
>> active+clean] execute_ctx 0x564545128700
>> 2018-05-14 13:47:08.181265 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4(
>> v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132
>> lis/c 223/223 les/c/f 224/224/0 223/223/
>> 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881
>> active+clean] execute_ctx 6:2bf9e020:::eb359f44-3316-4cd
>> 3-9006-d416c21e0745.1204464.6_2018%2f05%2f14%2fYWRjNmZmNzQzO
>> DI2ZGQzOTc3ZjFiNGMxZjIxOGZlYzQvaHR0cDovL3d3dy1sdS0wMS0zNi52b
>> 3hpbXBsYW50LmNvbS9yZWNvcmRzLzIwMTgvMDUvMTQvOTRlNjYxY2JiZjU3M
>> Tk4NS4xNTI2MjkwMzQ0Ljk2NjQ5MS5tcDM-:head [getxattrs,stat,read 0~41943
>> 04] ov 236'20762
>> 2018-05-14 13:47:08.181282 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4(
>> v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132
>> lis/c 223/223 les/c/f 224/224/0 223/223/
>> 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881
>> active+clean]  taking ondisk_read_lock
>> 2018-05-14 13:47:08.181292 7fb8055d3700 10 osd.21 pg_epoch: 236 pg[6.1d4(
>> v 236'20882 (236'19289,236'20882] local-lis/les=223/224 n=20791 ec=145/132
>> lis/c 223/223 les/c/f 224/224/0 223/223/
>> 212) [21,29,17] r=0 lpr=223 crt=236'20882 lcod 236'20881 mlcod 236'20881
>> active+clean] do_osd_op 6:2bf9e020:::eb359f44-3316-4cd
>> 3-9006-d416c21e0745.1204464.6_2018%2f05%2f14%2fYWRjNmZmNzQzODI
>> 2ZGQzOTc3ZjFiNGMxZjIxOGZlYzQvaHR0cDovL3d3dy1sdS0wMS0zNi52b3h
>> pbXBsYW50LmNvbS9yZWNvcmRzLzIwMTgvMDUvMTQvOTRlNjYxY2JiZjU3MTk
>> 4NS4xNTI2MjkwMzQ0Ljk2NjQ5MS5tcDM-:head [getxattrs,stat,read 0~4194304
>> ]
>>
>> What are the key words to identify the problem if I want to catch this
>> line?
>>
>> 2018-05-14 13:47:07.764236 mon.storage-ru1-osd1 mon.0
>> 185.164.149.2:6789/0 553903 : cluster [WRN] Health check update: 32 slow
>> requests are blocked > 32 sec (REQUEST_SLOW)
>>
>>
>> Thanks in advance.
>>
>> Grigory Murashov
>> Voximplant
>>
>> 10.05.2018 20:12, David Turner пишет:
>>
>> 2. When logging the 1/5 is what's written to the log file/what's
>> temporarily stored in memory.  If you want to increase logging, you need to
>> increase both numbers to 20/20 or 10/10.  You can also just set it to 20 or
>> 10 and ceph will set them to the same number.  I personally do both numbers
>> to remind myself that the defaults aren't the same to set it back to.
>>
>> 3. You are not accessing something stored within the ceph cluster, so it
>> isn't using your admin cephx key that you have.  It is accessing the daemon
>> socket in the OS so you need to have proper permissions to be able to
>> access it.  The daemon sockets are located in /var/run/ceph/ which is a
>> folder without any public permissions.  You could use `sudo -u ceph ceph
>> daemon osd.15 dump_historic_opsor` or just `sudo ceph daemon osd.15
>> dump_historic_ops` as root can access the daemon socket as well.
>>
>> On Thu, May 10, 2018 at 7:14 AM Grigory Murashov <[email protected]>
>> wrote:
>>
>>> Hi JC!
>>>
>>> Thanks for your answer first.
>>>
>>> 1. I have added output of  ceph health detail to Zabbix in case of
>>> warning. So every time I will see with which OSD the problem is.
>>>
>>> 2. I have default level of all logs. As I see here
>>> http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/
>>> default log level for OSD is 1/5. Should I try
>>>
>>> debug osd = 1/20 of 1/10 would be enough here?
>>>
>>> 3. Any thoughts why do I have Permission denied? All of my sockets are
>>> also defaults.
>>>
>>> [cephuser@osd3 ~]$ ceph daemon osd.15 dump_historic_ops
>>> admin_socket: exception getting command descriptions: [Errno 13]
>>> Permission denied
>>>
>>> Thanks in advance.
>>>
>>> Grigory Murashov
>>> Voximplant
>>>
>>> 08.05.2018 17:31, Jean-Charles Lopez пишет:
>>> > Hi Grigory,
>>> >
>>> > are these lines the only lines in your log file for OSD 15?
>>> >
>>> > Just for sanity, what are the log levels you have set, if any, in your
>>> config file away from the default? If you set all log levels to 0 like some
>>> people do you may want to simply go back to the default by commenting out
>>> the debug_ lines in your config file. If you want to see something more
>>> detailed you can indeed increase the log level to 5 or 10.
>>> >
>>> > What you can also do is to use the admin socket on the machine to see
>>> what operations are actually blocked: ceph daemon osd.15 dump_ops_in_flight
>>> and ceph daemon osd.15 dump_historic_ops.
>>> >
>>> > These two commands and their output will show you what exact
>>> operations are blocked and will also point you to the other OSDs this OSD
>>> is working with to serve the IO. May be the culprit is actually one of the
>>> OSDs handling the subops or it could be a network problem.
>>> >
>>> > Regards
>>> > JC
>>> >
>>> >> On May 8, 2018, at 03:11, Grigory Murashov <[email protected]>
>>> wrote:
>>> >>
>>> >> Hello Jean-Charles!
>>> >>
>>> >> I have finally catch the problem, It was at 13-02.
>>> >>
>>> >> [cephuser@storage-ru1-osd3 ~]$ ceph health detail
>>> >> HEALTH_WARN 18 slow requests are blocked > 32 sec
>>> >> REQUEST_SLOW 18 slow requests are blocked > 32 sec
>>> >>      3 ops are blocked > 65.536 sec
>>> >>      15 ops are blocked > 32.768 sec
>>> >>      osd.15 has blocked requests > 65.536 sec
>>> >> [cephuser@storage-ru1-osd3 ~]$
>>> >>
>>> >>
>>> >> But surprise - there is no information in ceph-osd.15.log that time
>>> >>
>>> >>
>>> >> 2018-05-08 12:54:26.105919 7f003f5f9700  4 rocksdb: (Original Log
>>> Time 2018/05/08-12:54:26.105843) EVENT_LOG_v1 {"time_micros":
>>> 1525773266105834, "job": 2793, "event": "trivial_move", "dest
>>> >> ination_level": 3, "files": 1, "total_files_size": 68316970}
>>> >> 2018-05-08 12:54:26.105926 7f003f5f9700  4 rocksdb: (Original Log
>>> Time 2018/05/08-12:54:26.105854) [/home/jenkins-build/build/wor
>>> kspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABL
>>> >> E_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.
>>> 4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_compaction_flush.cc:1537]
>>> [default] Moved #1 files to level-3 68316970 bytes OK
>>> >> : base level 1 max bytes base 268435456 files[0 4 45 403 722 0 0] max
>>> score 0.98
>>> >>
>>> >> 2018-05-08 13:07:29.711425 7f004f619700  4 rocksdb:
>>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/
>>> AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/
>>> MACHINE_SIZE/huge/r
>>> >> elease/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_write.cc:684]
>>> reusing log 8051 from recycle list
>>> >>
>>> >> 2018-05-08 13:07:29.711497 7f004f619700  4 rocksdb:
>>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/
>>> AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/
>>> MACHINE_SIZE/huge/r
>>> >> elease/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_write.cc:725]
>>> [default] New memtable created with log file: #8089. Immutable memtables: 0.
>>> >>
>>> >> 2018-05-08 13:07:29.726107 7f003fdfa700  4 rocksdb: (Original Log
>>> Time 2018/05/08-13:07:29.711524) [/home/jenkins-build/build/wor
>>> kspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABL
>>> >> E_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.
>>> 4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/db_impl_compaction_flush.cc:1158]
>>> Calling FlushMemTableToOutputFile with column family
>>> >> [default], flush slots available 1, compaction slots allowed 1,
>>> compaction slots scheduled 1
>>> >> 2018-05-08 13:07:29.726124 7f003fdfa700  4 rocksdb:
>>> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/
>>> AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/
>>> MACHINE_SIZE/huge/r
>>> >> elease/12.2.4/rpm/el7/BUILD/ceph-12.2.4/src/rocksdb/db/flush_job.cc:264]
>>> [default] [JOB 2794] Flushing memtable with next log file: 8089
>>> >>
>>> >> Should I have some deeply logging?
>>> >>
>>> >>
>>> >> Grigory Murashov
>>> >> Voximplant
>>> >>
>>> >> 07.05.2018 18:59, Jean-Charles Lopez пишет:
>>> >>> Hi,
>>> >>>
>>> >>> ceph health detail
>>> >>>
>>> >>> This will tell you which OSDs are experiencing the problem so you
>>> can then go and inspect the logs and use the admin socket to find out which
>>> requests are at the source.
>>> >>>
>>> >>> Regards
>>> >>> JC
>>> >>>
>>> >>>> On May 7, 2018, at 03:52, Grigory Murashov <[email protected]>
>>> wrote:
>>> >>>>
>>> >>>> Hello!
>>> >>>>
>>> >>>> I'm not much experiensed in ceph troubleshouting that why I ask for
>>> help.
>>> >>>>
>>> >>>> I have multiple warnings coming from zabbix as a result of ceph -s
>>> >>>>
>>> >>>> REQUEST_SLOW: HEALTH_WARN : 21 slow requests are blocked > 32 sec
>>> >>>>
>>> >>>> I don't see any hardware problems that time.
>>> >>>>
>>> >>>> I'm able to find the same strings in ceph.log and ceph-mon.log like
>>> >>>>
>>> >>>> 2018-05-07 12:37:57.375546 7f3037dae700  0 log_channel(cluster) log
>>> [WRN] : Health check failed: 12 slow requests are blocked > 32 sec
>>> (REQUEST_SLOW)
>>> >>>>
>>> >>>> Now It's important to find out the root of the problem.
>>> >>>>
>>> >>>> How to find out:
>>> >>>>
>>> >>>> 1. which OSDs are affected
>>> >>>>
>>> >>>> 2. which particular requests were slowed and blocked?
>>> >>>>
>>> >>>> I assume I need more detailed logging - how to do that?
>>> >>>>
>>> >>>> Appreciate your help.
>>> >>>>
>>> >>>> --
>>> >>>> Grigory Murashov
>>> >>>>
>>> >>>> _______________________________________________
>>> >>>> ceph-users mailing list
>>> >>>> [email protected]
>>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> <https://maps.google.com/?q=Freseniusstr.+31h+%0D%0A++++++++++++++++81247+M%C3%BCnchen&entry=gmail&source=g>
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
>


-- 
-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] slow requests are blocked

Reply via email to