Thank jake, can you confirm are you testing this in which ceph version -
the out of memory you noticed. There is already a memory leak issue
reported in kraken v11.2.0 .  which addressed in this tracker ..
http://tracker.ceph.com/issues/18924 ..

#ceph -v

Ok so you are mounting/mapping ceph as a rbd and writing into it.

We are discussing luminous v12.0.3 issue here, I think we are all on the
same path.

Thanks
Jayaram


On Thu, Jun 8, 2017 at 8:13 PM, Jake Grimmett <j...@mrc-lmb.cam.ac.uk> wrote:

> Hi Mark / Jayaram,
>
> After running the cluster last night, I noticed lots of
> "Out Of Memory" errors in /var/log/messages, many of these correlate to
> dead OSD's. If this is the problem, this might now be another case of
> the high memory use issues reported in Kraken.
>
> e.g. my script logs:
> Thu 8 Jun 08:26:37 BST 2017  restart OSD  1
>
> and /var/log/messages states...
>
> Jun  8 08:26:35 ceph1 kernel: Out of memory: Kill process 7899
> (ceph-osd) score 113 or sacrifice child
> Jun  8 08:26:35 ceph1 kernel: Killed process 7899 (ceph-osd)
> total-vm:8569516kB, anon-rss:7518836kB, file-rss:0kB, shmem-rss:0kB
> Jun  8 08:26:36 ceph1 systemd: ceph-osd@1.service: main process exited,
> code=killed, status=9/KILL
> Jun  8 08:26:36 ceph1 systemd: Unit ceph-osd@1.service entered failed
> state.
>
> The OSD nodes have 64GB RAM, presumably enough RAM for 10 OSD's doing
> 4+1 EC ?
>
> I've added "bluestore_cache_size = 104857600" to ceph.conf, and am
> retesting. I will see if OSD problems occur, and report back.
>
> As to loading the cluster, I run an rsync job on each node, pulling data
> from an NFS mounted Isilon. A single node pulls ~200MB/s, with all 7
> nodes running, the ceph -w reports between 700 > 1500MB/s writes.
>
> as requested, here is my "restart_OSD_and_log-this.sh" script:
>
> ************************************************************************
> #!/bin/bash
> # catches single failed OSDs, log and restart
> while : ; do
>         OSD=`ceph osd tree 2> /dev/null | grep down | \
>         awk '{ print $3}' | awk -F "." '{print $2 }'`
> if [ "$OSD" != "" ] ; then
>         DATE=`date`
>         echo $DATE " restart OSD " $OSD  >> /root/osd_restart_log
>         echo "OSD" $OSD "is down, restarting.."
>         OSDHOST=`ceph osd find $OSD | grep host | awk -F '"' '{print $4}'`
>         ssh $OSDHOST systemctl restart ceph-osd@$OSD
>         sleep 30
> else
>         echo -ne "\r\033[k"
>         echo -ne "all OSD OK"
> fi
>         sleep 1
> done
> ************************************************************************
>
> thanks again,
>
> Jake
>
> On 08/06/17 12:08, nokia ceph wrote:
> > Hello Mark,
> >
> > Raised tracker for the issue  -- http://tracker.ceph.com/issues/20222
> >
> > Jake can you share the restart_OSD_and_log-this.sh script
> >
> > Thanks
> > Jayaram
> >
> > On Wed, Jun 7, 2017 at 9:40 PM, Jake Grimmett <j...@mrc-lmb.cam.ac.uk
> > <mailto:j...@mrc-lmb.cam.ac.uk>> wrote:
> >
> >     Hi Mark & List,
> >
> >     Unfortunately, even when using yesterdays master version of ceph,
> >     I'm still seeing OSDs go down, same error as before:
> >
> >     OSD log shows lots of entries like this:
> >
> >     (osd38)
> >     2017-06-07 16:48:46.070564 7f90b58c3700  1 heartbeat_map is_healthy
> >     'tp_osd_tp thread tp_osd_tp' had timed out after 60
> >
> >     (osd3)
> >     2017-06-07 17:01:25.391075 7f62de6c3700  1 heartbeat_map is_healthy
> >     'tp_osd_tp thread tp_osd_tp' had timed out after 60
> >     2017-06-07 17:01:26.276881 7f62dbe86700 -1 osd.3 6165
> heartbeat_check:
> >     no reply from 10.1.0.86:6811 <http://10.1.0.86:6811> osd.2 since
> >     back 2017-06-07 17:00:19.640002
> >     front 2017-06-07 17:01:21.950160 (cutoff 2017-06-07 17:01:06.276881)
> >
> >
> >     [root@ceph4 ceph]# ceph -v
> >     ceph version 12.0.2-2399-ge38ca14
> >     (e38ca14914340d65ea8001c7bd6e0ff769f3eb2e) luminous (dev)
> >
> >
> >     I'll continue running the cluster with my
> "restart_OSD_and_log-this.sh"
> >     workaround...
> >
> >     thanks again for your help,
> >
> >     Jake
> >
> >     On 06/06/17 15:52, Jake Grimmett wrote:
> >     > Hi Mark,
> >     >
> >     > OK, I'll upgrade to the current master and retest...
> >     >
> >     > best,
> >     >
> >     > Jake
> >     >
> >     > On 06/06/17 15:46, Mark Nelson wrote:
> >     >> Hi Jake,
> >     >>
> >     >> I just happened to notice this was on 12.0.3.  Would it be
> >     possible to
> >     >> test this out with current master and see if it still is a
> problem?
> >     >>
> >     >> Mark
> >     >>
> >     >> On 06/06/2017 09:10 AM, Mark Nelson wrote:
> >     >>> Hi Jake,
> >     >>>
> >     >>> Thanks much.  I'm guessing at this point this is probably a
> >     bug.  Would
> >     >>> you (or nokiauser) mind creating a bug in the tracker with a
> short
> >     >>> description of what's going on and the collectl sample showing
> >     this is
> >     >>> not IOs backing up on the disk?
> >     >>>
> >     >>> If you want to try it, we have a gdb based wallclock profiler
> >     that might
> >     >>> be interesting to run while it's in the process of timing out.
> >     It tries
> >     >>> to grab 2000 samples from the osd process which typically takes
> >     about 10
> >     >>> minutes or so.  You'll need to either change the number of
> >     samples to be
> >     >>> lower in the python code (maybe like 50-100), or change the
> >     timeout to
> >     >>> be something longer.
> >     >>>
> >     >>> You can find the code here:
> >     >>>
> >     >>> https://github.com/markhpc/gdbprof
> >     <https://github.com/markhpc/gdbprof>
> >     >>>
> >     >>> and invoke it like:
> >     >>>
> >     >>> udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source
> >     >>> ./gdbprof.py' -ex 'profile begin' -ex 'quit'
> >     >>>
> >     >>> where 27962 in this case is the PID of the ceph-osd process.
> You'll
> >     >>> need gdb with the python bindings and the ceph debug symbols for
> >     it to
> >     >>> work.
> >     >>>
> >     >>> This might tell us over time if the tp_osd_tp processes are just
> >     sitting
> >     >>> on pg::locks.
> >     >>>
> >     >>> Mark
> >     >>>
> >     >>> On 06/06/2017 05:34 AM, Jake Grimmett wrote:
> >     >>>> Hi Mark,
> >     >>>>
> >     >>>> Thanks again for looking into this problem.
> >     >>>>
> >     >>>> I ran the cluster overnight, with a script checking for dead
> >     OSDs every
> >     >>>> second, and restarting them.
> >     >>>>
> >     >>>> 40 OSD failures occurred in 12 hours, some OSDs failed multiple
> >     times,
> >     >>>> (there are 50 OSDs in the EC tier).
> >     >>>>
> >     >>>> Unfortunately, the output of collectl doesn't appear to show any
> >     >>>> increase in disk queue depth and service times before the OSDs
> die.
> >     >>>>
> >     >>>> I've put a couple of examples of collectl output for the disks
> >     >>>> associated with the OSDs here:
> >     >>>>
> >     >>>> https://hastebin.com/icuvotemot.scala
> >     <https://hastebin.com/icuvotemot.scala>
> >     >>>>
> >     >>>> please let me know if you need more info...
> >     >>>>
> >     >>>> best regards,
> >     >>>>
> >     >>>> Jake
> >     >>>>
> >     >>>>
> >     >
> >     _______________________________________________
> >     ceph-users mailing list
> >     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to