Hi Ranjan,
Thanks for your reply. I did set scrub and nodeep-scrub flags. But active
scrubbing operation can’t working properly. Scrubbing operation always in same
pg (20.1e).
$ ceph pg dump | grep scrub
dumped all in format plain
pg_stat objects mip degr misp unf bytes log disklog state
state_stamp v reported up up_primary acting
acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
20.1e 25189 0 0 0 0 98359116362 3048 3048
active+clean+scrubbing 2017-08-21 04:55:13.354379 6930'23966663
6930:20949058 [29,31,3] 29 [29,31,3] 29 6712'22950171
2017-08-20 04:46:59.208792 6712'22950171 2017-08-20 04:46:59.208792
$ ceph -s
cluster ****
health HEALTH_WARN
33 requests are blocked > 32 sec
noscrub,nodeep-scrub flag(s) set
monmap e9: 3 mons at
{ceph-mon01=**:6789/0,ceph-mon02=**:6789/0,ceph-mon03=**:6789/0}
election epoch 84, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03
osdmap e6930: 36 osds: 36 up, 36 in
flags noscrub,nodeep-scrub,sortbitwise,require_jewel_osds
pgmap v17667617: 1408 pgs, 5 pools, 24779 GB data, 6494 kobjects
70497 GB used, 127 TB / 196 TB avail
1407 active+clean
1 active+clean+scrubbing
Thanks,
Ramazan
> On 22 Aug 2017, at 18:52, Ranjan Ghosh <[email protected]> wrote:
>
> Hi Ramazan,
>
> I'm no Ceph expert, but what I can say from my experience using Ceph is:
>
> 1) During "Scrubbing", Ceph can be extremely slow. This is probably where
> your "blocked requests" are coming from. BTW: Perhaps you can even find out
> which processes are currently blocking with: ps aux | grep "D". You might
> even want to kill some of those and/or shutdown services in order to relieve
> some stress from the machine until it recovers.
>
> 2) I usually have the following in my ceph.conf. This lets the scrubbing only
> run between midnight and 6 AM (hopefully the time of least demand; adjust as
> necessary) - and with the lowest priority.
>
> #Reduce impact of scrub.
> osd_disk_thread_ioprio_priority = 7
> osd_disk_thread_ioprio_class = "idle"
> osd_scrub_end_hour = 6
>
> 3) The Scrubbing begin and end hour will always work. The low priority mode,
> however, works (AFAIK!) only with CFQ I/O Scheduler. Show your current
> scheduler like this (replace sda with your device):
>
> cat /sys/block/sda/queue/scheduler
>
> You can also echo to this file to set a different scheduler.
>
>
> With these settings you can perhaps alleviate the problem so far, that the
> scrubbing runs over many nights until it finished. Again, AFAIK, it doesnt
> have to finish in one night. It will continue the next night and so on.
>
> The Ceph experts say scrubbing is important. Don't know why, but I just
> believe them. They've built this complex stuff after all :-)
>
> Thus, you can use "noscrub"/"nodeepscrub" to quickly get a hung server back
> to work, but you should not let it run like this forever and a day.
>
> Hope this helps at least a bit.
>
> BR,
>
> Ranjan
>
>
> Am 22.08.2017 um 15:20 schrieb Ramazan Terzi:
>> Hello,
>>
>> I have a Ceph Cluster with specifications below:
>> 3 x Monitor node
>> 6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have
>> SSD journals)
>> Distributed public and private networks. All NICs are 10Gbit/s
>> osd pool default size = 3
>> osd pool default min size = 2
>>
>> Ceph version is Jewel 10.2.6.
>>
>> My cluster is active and a lot of virtual machines running on it (Linux and
>> Windows VM's, database clusters, web servers etc).
>>
>> During normal use, cluster slowly went into a state of blocked requests.
>> Blocked requests periodically incrementing. All OSD's seems healthy.
>> Benchmark, iowait, network tests, all of them succeed.
>>
>> Yerterday, 08:00:
>> $ ceph health detail
>> HEALTH_WARN 3 requests are blocked > 32 sec; 3 osds have slow requests
>> 1 ops are blocked > 134218 sec on osd.31
>> 1 ops are blocked > 134218 sec on osd.3
>> 1 ops are blocked > 8388.61 sec on osd.29
>> 3 osds have slow requests
>>
>> Todat, 16:05:
>> $ ceph health detail
>> HEALTH_WARN 32 requests are blocked > 32 sec; 3 osds have slow requests
>> 1 ops are blocked > 134218 sec on osd.31
>> 1 ops are blocked > 134218 sec on osd.3
>> 16 ops are blocked > 134218 sec on osd.29
>> 11 ops are blocked > 67108.9 sec on osd.29
>> 2 ops are blocked > 16777.2 sec on osd.29
>> 1 ops are blocked > 8388.61 sec on osd.29
>> 3 osds have slow requests
>>
>> $ ceph pg dump | grep scrub
>> dumped all in format plain
>> pg_stat objects mip degr misp unf bytes log disklog
>> state state_stamp v reported up up_primary
>> acting acting_primary last_scrub scrub_stamp last_deep_scrub
>> deep_scrub_stamp
>> 20.1e 25183 0 0 0 0 98332537930 3066
>> 3066 active+clean+scrubbing 2017-08-21 04:55:13.354379
>> 6930'23908781 6930:20905696 [29,31,3] 29 [29,31,3] 29
>> 6712'22950171 2017-08-20 04:46:59.208792 6712'22950171
>> 2017-08-20 04:46:59.208792
>>
>> Active scrub does not finish (about 24 hours). I did not restart any OSD
>> meanwhile.
>> I'm thinking set noscrub, noscrub-deep, norebalance, nobackfill, and
>> norecover flags and restart 3,29,31th OSDs. Is this solve my problem? Or
>> anyone has suggestion about this problem?
>>
>> Thanks,
>> Ramazan
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com