How did you solve that? On Fri, Aug 24, 2018, 6:06 AM Andras Pataki <[email protected]> wrote:
> We pin half the OSDs to each socket (and to the corresponding memory). > Since the disk controller and the network card is connected only to one > socket, this still probably produces quite a bit of QPI traffic. > It is also worth investigating how the network does under high load. We > did run into problems where 40Gbps cards dropped packets heavily under > load. > > Andras > > > On 08/24/2018 05:16 AM, Marc Roos wrote: > > > > Can this be related to numa issues? I have also dual processor nodes, > > and was wondering if there is some guide on how to optimize for numa. > > > > > > > > > > -----Original Message----- > > From: Tyler Bishop [mailto:[email protected]] > > Sent: vrijdag 24 augustus 2018 3:11 > > To: Andras Pataki > > Cc: [email protected] > > Subject: Re: [ceph-users] Stability Issue with 52 OSD hosts > > > > Thanks for the info. I was investigating bluestore as well. My host > > dont go unresponsive but I do see parallel io slow down. > > > > On Thu, Aug 23, 2018, 8:02 PM Andras Pataki > > <[email protected]> wrote: > > > > > > We are also running some fairly dense nodes with CentOS 7.4 and ran > > into > > similar problems. The nodes ran filestore OSDs (Jewel, then > > Luminous). > > Sometimes a node would be so unresponsive that one couldn't even > > ssh to > > it (even though the root disk was a physically separate drive on a > > separate controller from the OSD drives). Often these would > > coincide > > with kernel stack traces about hung tasks. Initially we did blame > > high > > load, etc. from all the OSDs. > > > > But then we benchmarked the nodes independently of ceph (with > > iozone and > > such) and noticed problems there too. When we started a few dozen > > iozone processes on separate JBOD drives with xfs, some didn't even > > > > start and write a single byte for minutes. The conclusion we came > > to > > was that there is some interference among a lot of mounted xfs file > > > > systems in the Red Hat 3.10 kernels. Some kind of central lock > > that > > prevents dozens of xfs file systems from running in parallel. When > > we > > do I/O directly to raw devices in parallel, we saw no problems (no > > high > > loads, etc.). So we built a newer kernel, and the situation got > > better. 4.4 is already much better, nowadays we are testing moving > > to 4.14. > > > > Also, migrating to bluestore significantly reduced the load on > > these > > nodes too. At busy times, the filestore host loads were 20-30, > > even > > higher (on a 28 core node), while the bluestore nodes hummed along > > at a > > lot of perhaps 6 or 8. This also confirms that somehow lots of xfs > > > > mounts don't work in parallel. > > > > Andras > > > > > > On 08/23/2018 03:24 PM, Tyler Bishop wrote: > > > Yes I've reviewed all the logs from monitor and host. I am not > > > getting useful errors (or any) in dmesg or general messages. > > > > > > I have 2 ceph clusters, the other cluster is 300 SSD and i never > > have > > > issues like this. That's why Im looking for help. > > > > > > On Thu, Aug 23, 2018 at 3:22 PM Alex Gorbachev > > <[email protected]> wrote: > > >> On Wed, Aug 22, 2018 at 11:39 PM Tyler Bishop > > >> <[email protected]> wrote: > > >>> During high load testing I'm only seeing user and sys cpu load > > around 60%... my load doesn't seem to be anything crazy on the host and > > iowait stays between 6 and 10%. I have very good `ceph osd perf` > > numbers too. > > >>> > > >>> I am using 10.2.11 Jewel. > > >>> > > >>> > > >>> On Wed, Aug 22, 2018 at 11:30 PM Christian Balzer > > <[email protected]> wrote: > > >>>> Hello, > > >>>> > > >>>> On Wed, 22 Aug 2018 23:00:24 -0400 Tyler Bishop wrote: > > >>>> > > >>>>> Hi, I've been fighting to get good stability on my cluster > > for about > > >>>>> 3 weeks now. I am running into intermittent issues with OSD > > flapping > > >>>>> marking other OSD down then going back to a stable state for > > hours and > > >>>>> days. > > >>>>> > > >>>>> The cluster is 4x Cisco UCS S3260 with dual E5-2660, 256GB > > ram, 40G > > >>>>> Network to 40G Brocade VDX Switches. The OSD are 6TB HGST > > SAS drives > > >>>>> with 400GB HGST SAS 12G SSDs. My configuration is 4 > > journals per > > >>>>> host with 12 disk per journal for a total of 56 disk per > > system and 52 > > >>>>> OSD. > > >>>>> > > >>>> Any denser and you'd have a storage black hole. > > >>>> > > >>>> You already pointed your finger in the (or at least one) right > > direction > > >>>> and everybody will agree that this setup is woefully > > underpowered in the > > >>>> CPU department. > > >>>> > > >>>>> I am using CentOS 7 with kernel 3.10 and the redhat tuned-adm > > profile > > >>>>> for throughput-performance enabled. > > >>>>> > > >>>> Ceph version would be interesting as well... > > >>>> > > >>>>> I have these sysctls set: > > >>>>> > > >>>>> kernel.pid_max = 4194303 > > >>>>> fs.file-max = 6553600 > > >>>>> vm.swappiness = 0 > > >>>>> vm.vfs_cache_pressure = 50 > > >>>>> vm.min_free_kbytes = 3145728 > > >>>>> > > >>>>> I feel like my issue is directly related to the high number > > of OSD per > > >>>>> host but I'm not sure what issue I'm really running into. I > > believe > > >>>>> that I have ruled out network issues, i am able to get 38Gbit > > >>>>> consistently via iperf testing and mtu for jump pings > > successfully > > >>>>> with no fragment set and 8972 packet size. > > >>>>> > > >>>> The fact that it all works for days at a time suggests this as > > well, but > > >>>> you need to verify these things when they're happening. > > >>>> > > >>>>> From FIO testing I seem to be able to get 150-200k iops > > write from my > > >>>>> rbd clients on 1gbit networking... This is about what I > > expected due > > >>>>> to the write penalty and my underpowered CPU for the number > > of OSD. > > >>>>> > > >>>>> I get these messages which I believe are normal? > > >>>>> 2018-08-22 10:33:12.754722 7f7d009f5700 0 -- > > 10.20.136.8:6894/718902 > > >>>>>>> 10.20.136.10:6876/490574 pipe(0x55aed77fd400 sd=192 :40502 > > s=2 > > >>>>> pgs=1084 cs=53 l=0 c=0x55aed805bc80).fault with nothing to > > send, going > > >>>>> to standby > > >>>>> > > >>>> Ignore. > > >>>> > > >>>>> Then randomly I'll get a storm of this every few days for 20 > > minutes or so: > > >>>>> 2018-08-22 15:48:32.631186 7f44b7514700 -1 osd.127 37333 > > >>>>> heartbeat_check: no reply from 10.20.142.11:6861 osd.198 > > since back > > >>>>> 2018-08-22 15:48:08.052762 front 2018-08-22 15:48:31.282890 > > (cutoff > > >>>>> 2018-08-22 15:48:12.630773) > > >>>>> > > >>>> Randomly is unlikely. > > >>>> Again, catch it in the act, atop in huge terminal windows > > (showing all > > >>>> CPUs and disks) for all nodes should be very telling, > > collecting and > > >>>> graphing this data might work, too. > > >>>> > > >>>> My suspects would be deep scrubs and/or high IOPS spikes when > > this is > > >>>> happening, starving out OSD processes (CPU wise, RAM should be > > fine one > > >>>> supposes). > > >>>> > > >>>> Christian > > >>>> > > >>>>> Please help!!! > > >> Have you looked at the OSD logs on the OSD nodes by chance? I > > found > > >> that correlating the messages in those logs with your master > > ceph log > > >> and also correlating with any messages in syslog or kern.log can > > >> elucidate the cause of the problem pretty well. > > >> -- > > >> Alex Gorbachev > > >> Storcium > > >> > > >> > > >>>>> _______________________________________________ > > >>>>> ceph-users mailing list > > >>>>> [email protected] > > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>>>> > > >>>> > > >>>> -- > > >>>> Christian Balzer Network/Systems Engineer > > >>>> [email protected] Rakuten Communications > > >>> _______________________________________________ > > >>> ceph-users mailing list > > >>> [email protected] > > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > > > ceph-users mailing list > > > [email protected] > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
