Yes, if you dig down older mails, you will see I reported that as a Ubuntu 
kernel bug (not sure about other Linux flavors) .. vm.min_free_kbytes is the 
way to work around that..


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Tuomas 
Juntunen
Sent: Friday, August 07, 2015 1:57 PM
To: 'Константин Сахинов'; 'Quentin Hartman'
Cc: 'ceph-users'
Subject: Re: [ceph-users] Flapping OSD's when scrubbing

Hi

Thanks, we were able to resolve the problem by disabling swap completely, no 
need for it anyway.

Also memory was fragmenting since all memory was used for caching

Running “perf top”  we saw that freeing blocks of memory took all the cpu power

Samples: 4M of event 'cycles', Event count (approx.): 965471653281
 71.95%  [kernel]                 [k] isolate_freepages_block
 10.37%  [kernel]                 [k] __reset_isolation_suitable

Now by forcing systems to have 10GB of free memory all the time, the problem 
was solved.

We added

vm.min_free_kbytes = 10000000

to /etc/sysctl.conf

Don’t know why this happens, is this a “problem” of the kernel version we are 
running or something else. (Ubuntu 14.04 3.13.0-32-generic)

Br,
Tuomas

From: Константин Сахинов [mailto:sakhi...@gmail.com]
Sent: 7. elokuuta 2015 21:15
To: Tuomas Juntunen; Quentin Hartman
Cc: ceph-users
Subject: Re: [ceph-users] Flapping OSD's when scrubbing

Hi!

One time I faced such a behavior of my home cluster. At the time my OSDs go 
down I noticed that node is using swap despite of sufficient memory. Tuning 
/proc/sys/vm/swappiness to 0 helped to solve the problem.

пт, 7 авг. 2015 г. в 20:41, Tuomas Juntunen 
<tuomas.juntu...@databasement.fi<mailto:tuomas.juntu...@databasement.fi>>:
Thanks

We play with the values a bit and see what happens.

Br,
Tuomas


From: Quentin Hartman 
[mailto:qhart...@direwolfdigital.com<mailto:qhart...@direwolfdigital.com>]
Sent: 7. elokuuta 2015 20:32
To: Tuomas Juntunen
Cc: ceph-users
Subject: Re: [ceph-users] Flapping OSD's when scrubbing

That kind of behavior is usually caused by the OSDs getting busy enough that 
they aren't answering heartbeats in a timely fashion. It can also happen if you 
have any netowrk flakiness and heartbeats are getting lost because of that.

I think (I'm not positive though) that increasing your heartbeat interval may 
help. Also, looking at the number of threads you have for your OSDs, that seems 
potentially problematic. If you've got 24 OSDs per machine and each one is 
running 12 threads, that's 288 threads on 12 cores for just the requests. Plus 
the disk threads, plus the filestore op threads... That level of thread 
contention seems like it might be contributing to missing the heartbeats. But 
again, that's conjecture. I've not worked with a setup as dense as yours.

QH

On Fri, Aug 7, 2015 at 11:21 AM, Tuomas Juntunen 
<tuomas.juntu...@databasement.fi<mailto:tuomas.juntu...@databasement.fi>> wrote:
Hi

We are experiencing an annoying problem where scrubs make OSD’s flap down and 
cause Ceph cluster to be unusable for couple of minutes.

Our cluster consists of three nodes connected with 40gbit infiniband using 
IPoIB, with 2x 6 core X5670 CPU’s and 64GB of memory
Each node has 6 SSD’s for journals to 12 OSD’s 2TB disks (Fast pools) and 
another 12 OSD’s 4TB disks (Archive pools) which have journal on the same disk.

It seems that our cluster is constantly doing scrubbing, we rarely see only 
active+clean, below is the status at the moment.

    cluster a2974742-3805-4cd3-bc79-765f2bddaefe
     health HEALTH_OK
     monmap e16: 4 mons at 
{lb1=10.20.60.1:6789/0,lb2=10.20.60.2:6789/0,nc1=10.20.50.2:6789/0,nc2=10.20.50.3:6789/0<http://10.20.60.1:6789/0,lb2=10.20.60.2:6789/0,nc1=10.20.50.2:6789/0,nc2=10.20.50.3:6789/0>}
            election epoch 1838, quorum 0,1,2,3 nc1,nc2,lb1,lb2
     mdsmap e7901: 1/1/1 up {0=lb1=up:active}, 4 up:standby
     osdmap e104824: 72 osds: 72 up, 72 in
      pgmap v12941402: 5248 pgs, 9 pools, 19644 GB data, 4810 kobjects
            59067 GB used, 138 TB / 196 TB avail
                5241 active+clean
                   7 active+clean+scrubbing

When OSD’s go down, first the load on a node goes high during scrubbing and 
after that some OSD’s go down and 30 secs, they are back up. They are not 
really going down, but are marked as down. Then it takes around couple of 
minutes for everything be OK again.

Any suggestion how to fix this? We can’t go to production while this behavior 
exists.

Our config is below:

[global]
fsid = a2974742-3805-4cd3-bc79-765f2bddaefe
mon_initial_members = lb1,lb2,nc1,nc2
mon_host = 10.20.60.1,10.20.60.2,10.20.50.2,10.20.50.3
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

osd pool default pg num = 128
osd pool default pgp num = 128

public network = 10.20.0.0/16<http://10.20.0.0/16>

        osd_op_threads = 12
        osd_op_num_threads_per_shard = 2
        osd_op_num_shards = 6
        #osd_op_num_sharded_pool_threads = 25
        filestore_op_threads = 12
        ms_nocrc = true
        filestore_fd_cache_size = 64
        filestore_fd_cache_shards = 32
        ms_dispatch_throttle_bytes = 0
        throttler_perf_counter = false

mon osd min down reporters = 25

[osd]
osd scrub max interval = 1209600
osd scrub min interval = 604800
osd scrub load threshold = 3.0
osd max backfills = 1
osd recovery max active = 1
# IO Scheduler settings
osd scrub sleep = 1.0
osd disk thread ioprio class = idle
osd disk thread ioprio priority = 7
osd scrub chunk max = 1
osd scrub chunk min = 1
osd deep scrub stride = 1048576
filestore queue max ops = 10000
filestore max sync interval = 30
filestore min sync interval = 29

osd deep scrub interval = 2592000
        osd heartbeat grace = 240
        osd heartbeat interval = 12
        osd mon report interval max = 120
        osd mon report interval min = 5

       osd_client_message_size_cap = 0
        osd_client_message_cap = 0
        osd_enable_op_tracker = false

        osd crush update on start = false

[client]
        rbd cache = true
        rbd cache size = 67108864 # 64mb
        rbd cache max dirty = 50331648 # 48mb
        rbd cache target dirty = 33554432 # 32mb
        rbd cache writethrough until flush = true # It's by default
        rbd cache max dirty age = 2
        admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok


Br,
Tuomas

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to