Re: [Gluster-users] VMs blocked for more than 120 seconds

Martin Toth Mon, 13 May 2019 00:35:21 -0700

Cache in qemu is none. That should be correct. This is full command :

/usr/bin/qemu-system-x86_64 -name one-312 -S -machine 
pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 
4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 
-no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
-no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2


-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
-drive 
file=/var/lib/one//datastores/116/312/disk.0,format=raw,if=none,id=drive-virtio-disk1,cache=none
        -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
-drive 
file=gluster://localhost:24007/imagestore/7b64d6757acc47a39503f68731f89b8e,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
        -device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
-drive 
file=/var/lib/one//datastores/116/312/disk.1,format=raw,if=none,id=drive-ide0-0-0,readonly=on
        -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0

-netdev tap,fd=26,id=hostnet0 -device 
e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3 -chardev 
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -vnc 0.0.0.0:312,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

I’ve highlighted disks. First is VM context disk - Fuse used, second is SDA (OS 
is installed here) - libgfapi used, third is SWAP - Fuse used.

Krutika,
I will start profiling on Gluster Volumes and wait for next VM to fail. Than I 
will attach/send profiling info after some VM will be failed. I suppose this is 
correct profiling strategy.

Thanks,
BR!
Martin

> On 13 May 2019, at 09:21, Krutika Dhananjay <[email protected]> wrote:
> 
> Also, what's the caching policy that qemu is using on the affected vms?
> Is it cache=none? Or something else? You can get this information in the 
> command line of qemu-kvm process corresponding to your vm in the ps output.
> 
> -Krutika
> 
> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay <[email protected] 
> <mailto:[email protected]>> wrote:
> What version of gluster are you using?
> Also, can you capture and share volume-profile output for a run where you 
> manage to recreate this issue?
> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
>  
> <https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command>
> Let me know if you have any questions.
> 
> -Krutika
> 
> On Mon, May 13, 2019 at 12:34 PM Martin Toth <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi,
> 
> there is no healing operation, not peer disconnects, no readonly filesystem. 
> Yes, storage is slow and unavailable for 120 seconds, but why, its SSD with 
> 10G, performance is good.
> 
> > you'd have it's log on qemu's standard output,
> 
> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking for 
> problem for more than month, tried everything. Can’t find anything. Any more 
> clues or leads?
> 
> BR,
> Martin
> 
> > On 13 May 2019, at 08:55, [email protected] 
> > <mailto:[email protected]> wrote:
> > 
> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
> >> Hi all,
> > 
> > Hi
> > 
> >> 
> >> I am running replica 3 on SSDs with 10G networking, everything works OK 
> >> but VMs stored in Gluster volume occasionally freeze with “Task XY blocked 
> >> for more than 120 seconds”.
> >> Only solution is to poweroff (hard) VM and than boot it up again. I am 
> >> unable to SSH and also login with console, its stuck probably on some disk 
> >> operation. No error/warning logs or messages are store in VMs logs.
> >> 
> > 
> > As far as I know this should be unrelated, I get this during heals
> > without any freezes, it just means the storage is slow I think.
> > 
> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on 
> >> replica volume. Can someone advice  how to debug this problem or what can 
> >> cause these issues? 
> >> It’s really annoying, I’ve tried to google everything but nothing came up. 
> >> I’ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but 
> >> its not related.
> >> 
> > 
> > Any chance your gluster goes readonly ? Have you checked your gluster
> > logs to see if maybe they lose each other some times ?
> > /var/log/glusterfs
> > 
> > For libgfapi accesses you'd have it's log on qemu's standard output,
> > that might contain the actual error at the time of the freez.
> > _______________________________________________
> > Gluster-users mailing list
> > [email protected] <mailto:[email protected]>
> > https://lists.gluster.org/mailman/listinfo/gluster-users 
> > <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected] <mailto:[email protected]>
> https://lists.gluster.org/mailman/listinfo/gluster-users 
> <https://lists.gluster.org/mailman/listinfo/gluster-users>

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] VMs blocked for more than 120 seconds

Reply via email to