> Op 7 november 2017 om 10:14 schreef Jan Pekař - Imatic <[email protected]>:
>
>
> Additional info - it is not librbd related, I mapped disk through
> rbd map and it was the same - virtuals were stuck/frozen.
> I happened exactly when in my log appeared
>
Why aren't you using librbd? Is there a specific reason for that? With
Qemu/KVM/libvirt I always suggest to use librbd.
And in addition, what kernel version are you running?
Wido
> Nov 7 10:01:27 imatic-hydra01 kernel: [2266883.493688] libceph: osd6 down
>
> I can attach with strace to qemu process and I can get this running in loop:
>
> root@imatic-hydra01:/usr/local/libvirt/bin# strace -p 31963
> strace: Process 31963 attached
> ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7,
> events=POLLIN}, {fd=8, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46,
> events=POLLIN}], 6, {tv_sec=0, tv_nsec=355313847}, NULL, 8) = 0 (Timeout)
> poll([{fd=10, events=POLLOUT}], 1, 0) = 1 ([{fd=10,
> revents=POLLOUT|POLLHUP}])
> ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7,
> events=POLLIN}, {fd=8, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46,
> events=POLLIN}], 6, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
> poll([{fd=10, events=POLLOUT}], 1, 0) = 1 ([{fd=10,
> revents=POLLOUT|POLLHUP}])
> ppoll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7,
> events=POLLIN}, {fd=8, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46,
> events=POLLIN}], 6, {tv_sec=0, tv_nsec=493273904}, NULL, 8) = 0 (Timeout)
> Process 31963 detached
> <detached ...>
>
> Can you please give me brief info, what should I debug and how can I do
> that? I'm newbie in gdb debugging.
> It is not problem inside the virtual machine (like disk not responding)
> because I can't even get to VNC console and there is no kernel panic
> visible on it. Also I suppose kernel should ping without disk being
> available.
>
> Thank you
>
> With regards
> Jan Pekar
>
>
>
> On 7.11.2017 00:30, Jason Dillaman wrote:
> > If you could install the debug packages and get a gdb backtrace from all
> > threads it would be helpful. librbd doesn't utilize any QEMU threads so
> > even if librbd was deadlocked, the worst case that I would expect would
> > be your guest OS complaining about hung kernel tasks related to disk IO
> > (since the disk wouldn't be responding).
> >
> > On Mon, Nov 6, 2017 at 6:02 PM, Jan Pekař - Imatic <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > Hi,
> >
> > I'm using debian stretch with ceph 12.2.1-1~bpo80+1 and qemu
> > 1:2.8+dfsg-6+deb9u3
> > I'm running 3 nodes with 3 monitors and 8 osds on my nodes, all on IPV6.
> >
> > When I tested the cluster, I detected strange and severe problem.
> > On first node I'm running qemu hosts with librados disk connection
> > to the cluster and all 3 monitors mentioned in connection.
> > On second node I stopped mon and osd with command
> >
> > kill -STOP MONPID OSDPID
> >
> > Within one minute all my qemu hosts on first node freeze, so they
> > even don't respond to ping. On VNC screen there is no error (disk or
> > kernel panic), they just hung forever with no console response. Even
> > starting MON and OSD on stopped host doesn't make them running.
> > Destroying the qemu domain and running again is the only solution.
> >
> > This happens even if virtual machine has all primary OSD on other
> > OSDs from that I have stopped - so it is not writing primary to the
> > stopped OSD.
> >
> > If I stop only OSD and MON keep running, or I stop only MON and OSD
> > keep running everything looks OK.
> >
> > When I stop MON and OSD, I can see in log osd.0 1300
> > heartbeat_check: no reply from ... as usual when OSD fails. During
> > this are virtuals still running, but after that they all stop.
> >
> > What should I send you to debug this problem? Without fixing that,
> > ceph is not reliable to me.
> >
> > Thank you
> > With regards
> > Jan Pekar
> > Imatic
> > _______________________________________________
> > ceph-users mailing list
> > [email protected] <mailto:[email protected]>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >
> >
> >
> >
> > --
> > Jason
>
> --
> ============
> Ing. Jan Pekař
> [email protected] | +420603811737
> ----
> Imatic | Jagellonská 14 | Praha 3 | 130 00
> http://www.imatic.cz
> ============
> --
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com