Hey Erik, I am running a similar setup with no issues having Ubuntu Host Systems on HPE DL380 Gen 10. I however used to run libvirt/qemu via nfs-ganesha on top of gluster flawlessly. Recently I upgraded to the native GFAPI implementation, which is poorly documented with snippets all over the internet.
Although I cannot provide a direct solution for your issue, I am however suggesting to try either nfs-ganesha as a replacement for fuse mount or GFAPI. Happy to share libvirt/GFAPI config hints to make it happen. Best A. Am Sonntag, dem 13.10.2024 um 21:59 +0000 schrieb Jacobson, Erik: > Hello all! We are experiencing a strange problem with QEMU virtual > machines where the virtual machine image is hosted on a gluster > volume. Access via fuse. (Our GFAPI attempt failed, it doesn’t seem > to work properly with current QEMU/distro/gluster). We have the > volume tuned for ‘virt’. > > So we use qemu-img to create a raw image. You can use sparse or > falloc with equal results. We start a virtual machine (libvirt, qemu- > kvm) and libvirt/qemu points to the fuse mount with the QEMU image > file we created. > > When we create partitions and filesystems – like you might do for > installing an operating system – all is well at first. This includes > a root XFS filesystem. > > When we try to re-make the XFS filesystem over the old one, it will > not mount and will report XFS corruption. > If you dig into XFS repair, you can find a UUID mismatch between the > superblock and the log. The log always retains the UUID of the > original filesystem (the one we tried to replace). Running xfs_repair > doesn’t truly repair, it just reports more corruption. xfs_db forcing > to remake the log doesn’t help. > > We can duplicate this with even a QEMU raw image of 50 megabytes. As > far as we can tell, XFS is the only filesystem showing this behavior > or at least the only one reporting a problem. > > If we take QEMU out of the picture and create partitions directly on > the QEMU raw image file, then use kpartx to create devices to the > partitions, and run a similar test – the gluster-hosted image behaves > as you would expect and there is no problem reported by XFS. We can’t > duplicate the problem outside of QEMU. > > We have observed the issue with Rocky 9.4 and SLES15 SP5 environments > (including the matching QEMU versions). We have not tested more > distros yet. > > We observed the problem originally with Gluster 9.3. We reproduced it > with Gluster 9.6 and 10.5. > > If we switch from QEMU RAW to QCOW2, the problem disappears. > > The problem is not reproduced when we take gluster out of the > equation (meaning, pointing QEMU at a local disk image instead of > gluster-hosted one – that works fine). > > The problem can be reproduced this way: > * Assume /adminvm/images on a gluster sharded volume > * rm /adminvm/images/adminvm.img > * qemu-img create -f raw /adminvm/images/adminvm.img 50M > > Now start the virtual machine that refers to the above adminvm.img > file > * Boot up a rescue environment or a live mode or similar > * sgdisk --zap-all /dev/sda > * sgdisk --set-alignment=4096 --clear /dev/sda > * sgdisk --set-alignment=4096 --new=1:0:0 /dev/sda > * mkfs.xfs -L fs1 /dev/sda1 > * mkdir -p /a > * mount /dev/sda1 /a > * umount /a > * # MAKE same FS again: > * mkfs.xfs -f -L fs1 /dev/sda1 > * mount /dev/sda1 /a > * This will fail with kernel back traces and corruption reported > * xfs_repair will report the log vs superblock UUID mismatch I > mentioned > > Here are the volume settings: > > # gluster volume info adminvm > > Volume Name: adminvm > Type: Replicate > Volume ID: de655913-aad9-4e17-bac4-ff0ad9c28223 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: 172.23.254.181:/data/brick_adminvm_slot2 > Brick2: 172.23.254.182:/data/brick_adminvm_slot2 > Brick3: 172.23.254.183:/data/brick_adminvm_slot2 > Options Reconfigured: > storage.owner-gid: 107 > storage.owner-uid: 107 > performance.io-thread-count: 32 > network.frame-timeout: 10800 > cluster.lookup-optimize: off > server.keepalive-count: 5 > server.keepalive-interval: 2 > server.keepalive-time: 10 > server.tcp-user-timeout: 20 > network.ping-timeout: 20 > server.event-threads: 4 > client.event-threads: 4 > cluster.choose-local: off > user.cifs: off > features.shard: on > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 8 > cluster.locking-scheme: granular > cluster.data-self-heal-algorithm: full > cluster.server-quorum-type: server > cluster.quorum-type: auto > cluster.eager-lock: enable > performance.strict-o-direct: on > network.remote-dio: disable > performance.low-prio-threads: 32 > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > cluster.granular-entry-heal: enable > storage.fips-mode-rchecksum: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: on > > Any help or ideas would be appreciated. Let us know if we have a > setting incorrect or have made an error. > > Thank you all! > > Erik > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users