I've taken a look into all possible places they should be, and I couldn't find it anywhere. Some people say the dump file is generated where the application is running... well, I don't know where to look then, and I hope they hadn't been generated on the failed mountpoint.
As Debian 11 has systemd, I've installed systemd-coredump, so in the case a new crash happens, at least I will have the exact location and tool (coredumpctl) to find them and will install then the debug symbols, which is particularly tricky on debian. But I need to wait to happen again, now the tool says there isn't any core dump on the system. Thank you, Xavi, if this happens again (let's hope it won't), I will report back. Best regards! *Angel Docampo* <https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021> <angel.doca...@eoniantec.com> <+34-93-1592929> El mar, 22 nov 2022 a las 10:45, Xavi Hernandez (<jaher...@redhat.com>) escribió: > The crash seems related to some problem in ec xlator, but I don't have > enough information to determine what it is. The crash should have generated > a core dump somewhere in the system (I don't know where Debian keeps the > core dumps). If you find it, you should be able to open it using this > command (make sure debug symbols package is also installed before running > it): > > # gdb /usr/sbin/glusterfs <path to core dump> > > And then run this command: > > # bt -full > > Regards, > > Xavi > > On Tue, Nov 22, 2022 at 9:41 AM Angel Docampo <angel.doca...@eoniantec.com> > wrote: > >> Hi Xavi, >> >> The OS is Debian 11 with the proxmox kernel. Gluster packages are the >> official from gluster.org ( >> https://download.gluster.org/pub/gluster/glusterfs/10/10.3/Debian/bullseye/ >> ) >> >> The system logs showed no other issues by the time of the crash, no OOM >> kill or whatsoever, and no other process was interacting with the gluster >> mountpoint besides proxmox. >> >> I wasn't running gdb when it crashed, so I don't really know if I can >> obtain a more detailed trace from logs or if there is a simple way to let >> it running in the background to see if it happens again (or there is a flag >> to start the systemd daemon in debug mode). >> >> Best, >> >> *Angel Docampo* >> >> <https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021> >> <angel.doca...@eoniantec.com> <+34-93-1592929> >> >> >> El lun, 21 nov 2022 a las 15:16, Xavi Hernandez (<jaher...@redhat.com>) >> escribió: >> >>> Hi Angel, >>> >>> On Mon, Nov 21, 2022 at 2:33 PM Angel Docampo < >>> angel.doca...@eoniantec.com> wrote: >>> >>>> Sorry for necrobumping this, but this morning I've suffered this on my >>>> Proxmox + GlusterFS cluster. In the log I can see this >>>> >>>> [2022-11-21 07:38:00.213620 +0000] I [MSGID: 133017] >>>> [shard.c:7275:shard_seek] 11-vmdata-shard: seek called on >>>> fbc063cb-874e-475d-b585-f89 >>>> f7518acdd. [Operation not supported] >>>> pending frames: >>>> frame : type(1) op(WRITE) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> frame : type(0) op(0) >>>> ... >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> frame : type(1) op(FSYNC) >>>> patchset: git://git.gluster.org/glusterfs.git >>>> signal received: 11 >>>> time of crash: >>>> 2022-11-21 07:38:00 +0000 >>>> configuration details: >>>> argp 1 >>>> backtrace 1 >>>> dlfcn 1 >>>> libpthread 1 >>>> llistxattr 1 >>>> setfsid 1 >>>> epoll.h 1 >>>> xattr.h 1 >>>> st_atim.tv_nsec 1 >>>> package-string: glusterfs 10.3 >>>> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x28a54)[0x7f74f286ba54] >>>> /lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x700)[0x7f74f2873fc0] >>>> >>>> /lib/x86_64-linux-gnu/libc.so.6(+0x38d60)[0x7f74f262ed60] >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x37a14)[0x7f74ecfcea14] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x21d59)[0x7f74ecfb8d59] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x22815)[0x7f74ecfb9815] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x377d9)[0x7f74ecfce7d9] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x19414)[0x7f74ecfb0414] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x16373)[0x7f74ecfad373] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x170f9)[0x7f74ecfae0f9] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/cluster/disperse.so(+0x313bb)[0x7f74ecfc83bb] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/xlator/protocol/client.so(+0x48e3a)[0x7f74ed06ce3a] >>>> >>>> /lib/x86_64-linux-gnu/libgfrpc.so.0(+0xfccb)[0x7f74f2816ccb] >>>> /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7f74f2812646] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0x64c8)[0x7f74ee15f4c8] >>>> >>>> /usr/lib/x86_64-linux-gnu/glusterfs/10.3/rpc-transport/socket.so(+0xd38c)[0x7f74ee16638c] >>>> >>>> /lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7971d)[0x7f74f28bc71d] >>>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f74f27d2ea7] >>>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f74f26f2aef] >>>> --------- >>>> The mount point wasn't accessible with the "Transport endpoint is not >>>> connected" message and it was shown like this. >>>> d????????? ? ? ? ? ? vmdata >>>> >>>> I had to stop all the VMs on that proxmox node, then stop the gluster >>>> daemon to ummount de directory, and after starting the daemon and >>>> re-mounting, all was working again. >>>> >>>> My gluster volume info returns this >>>> >>>> Volume Name: vmdata >>>> Type: Distributed-Disperse >>>> Volume ID: cace5aa4-b13a-4750-8736-aa179c2485e1 >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 2 x (2 + 1) = 6 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: g01:/data/brick1/brick >>>> Brick2: g02:/data/brick2/brick >>>> Brick3: g03:/data/brick1/brick >>>> Brick4: g01:/data/brick2/brick >>>> Brick5: g02:/data/brick1/brick >>>> Brick6: g03:/data/brick2/brick >>>> Options Reconfigured: >>>> nfs.disable: on >>>> transport.address-family: inet >>>> storage.fips-mode-rchecksum: on >>>> features.shard: enable >>>> features.shard-block-size: 256MB >>>> performance.read-ahead: off >>>> performance.quick-read: off >>>> performance.io-cache: off >>>> server.event-threads: 2 >>>> client.event-threads: 3 >>>> performance.client-io-threads: on >>>> performance.stat-prefetch: off >>>> dht.force-readdirp: off >>>> performance.force-readdirp: off >>>> network.remote-dio: on >>>> features.cache-invalidation: on >>>> performance.parallel-readdir: on >>>> performance.readdir-ahead: on >>>> >>>> Xavi, do you think the open-behind off setting can help somehow? I did >>>> try to understand what it does (with no luck), and if it could impact the >>>> performance of my VMs (I've the setup you know so well ;)) >>>> I would like to avoid more crashings like this, version 10.3 of gluster >>>> was working since two weeks ago, quite well until this morning. >>>> >>> >>> I don't think disabling open-behind will have any visible effect on >>> performance. Open-behind is only useful for small files when the workload >>> is mostly open + read + close, and quick-read is also enabled (which is not >>> your case). The only effect it will have is that the latency "saved" during >>> open is "paid" on the next operation sent to the file, so the total overall >>> latency should be the same. Additionally, VM workload doesn't open files >>> frequently, so it shouldn't matter much in any case. >>> >>> That said, I'm not sure if the problem is the same in your case. Based >>> on the stack of the crash, it seems an issue inside the disperse module. >>> >>> What OS are you using ? are you using official packages ? if so, which >>> ones ? >>> >>> Is it possible to provide a backtrace from gdb ? >>> >>> Regards, >>> >>> Xavi >>> >>> >>>> *Angel Docampo* >>>> >>>> <https://www.google.com/maps/place/Edificio+de+Oficinas+Euro+3/@41.3755943,2.0730134,17z/data=!3m2!4b1!5s0x12a4997021aad323:0x3e06bf8ae6d68351!4m5!3m4!1s0x12a4997a67bf592f:0x83c2323a9cc2aa4b!8m2!3d41.3755903!4d2.0752021> >>>> <angel.doca...@eoniantec.com> <+34-93-1592929> >>>> >>>> >>>> El vie, 19 mar 2021 a las 2:10, David Cunningham (< >>>> dcunning...@voisonics.com>) escribió: >>>> >>>>> Hi Xavi, >>>>> >>>>> Thank you for that information. We'll look at upgrading it. >>>>> >>>>> >>>>> On Fri, 12 Mar 2021 at 05:20, Xavi Hernandez <jaher...@redhat.com> >>>>> wrote: >>>>> >>>>>> Hi David, >>>>>> >>>>>> with so little information it's hard to tell, but given that there >>>>>> are several OPEN and UNLINK operations, it could be related to an already >>>>>> fixed bug (in recent versions) in open-behind. >>>>>> >>>>>> You can try disabling open-behind with this command: >>>>>> >>>>>> # gluster volume set <volname> open-behind off >>>>>> >>>>>> But given the version you are using is very old and unmaintained, I >>>>>> would recommend you to upgrade to 8.x at least. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Xavi >>>>>> >>>>>> >>>>>> On Wed, Mar 10, 2021 at 5:10 AM David Cunningham < >>>>>> dcunning...@voisonics.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> We have a GlusterFS 5.13 server which also mounts itself with the >>>>>>> native FUSE client. Recently the FUSE mount crashed and we found the >>>>>>> following in the syslog. There isn't anything logged in >>>>>>> mnt-glusterfs.log >>>>>>> for that time. After killing all processes with a file handle open on >>>>>>> the >>>>>>> filesystem we were able to unmount and then remount the filesystem >>>>>>> successfully. >>>>>>> >>>>>>> Would anyone have advice on how to debug this crash? Thank you in >>>>>>> advance! >>>>>>> >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: pending frames: >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0) >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0) >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK) >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(UNLINK) >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN) >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 3355 >>>>>>> times: [ frame : type(1) op(OPEN)] >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN) >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 6965 >>>>>>> times: [ frame : type(1) op(OPEN)] >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(1) op(OPEN) >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: message repeated 4095 >>>>>>> times: [ frame : type(1) op(OPEN)] >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: frame : type(0) op(0) >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: patchset: git:// >>>>>>> git.gluster.org/glusterfs.git >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: signal received: 11 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: time of crash: >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: 2021-03-09 03:12:31 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: configuration details: >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: argp 1 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: backtrace 1 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: dlfcn 1 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: libpthread 1 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: llistxattr 1 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: setfsid 1 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: spinlock 1 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: epoll.h 1 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: xattr.h 1 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: st_atim.tv_nsec 1 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: package-string: glusterfs >>>>>>> 5.13 >>>>>>> Mar 9 05:12:31 voip1 mnt-glusterfs[2932]: --------- >>>>>>> ... >>>>>>> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: >>>>>>> Main process exited, code=killed, status=11/SEGV >>>>>>> Mar 9 05:13:50 voip1 systemd[1]: glusterfssharedstorage.service: >>>>>>> Failed with result 'signal'. >>>>>>> ... >>>>>>> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service: >>>>>>> Service hold-off time over, scheduling restart. >>>>>>> Mar 9 05:13:54 voip1 systemd[1]: glusterfssharedstorage.service: >>>>>>> Scheduled restart job, restart counter is at 2. >>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Stopped Mount glusterfs >>>>>>> sharedstorage. >>>>>>> Mar 9 05:13:54 voip1 systemd[1]: Starting Mount glusterfs >>>>>>> sharedstorage... >>>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: ERROR: Mount >>>>>>> point does not exist >>>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Please specify >>>>>>> a mount point >>>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: Usage: >>>>>>> Mar 9 05:13:54 voip1 mount-shared-storage.sh[20520]: man 8 >>>>>>> /sbin/mount.glusterfs >>>>>>> >>>>>>> -- >>>>>>> David Cunningham, Voisonics Limited >>>>>>> http://voisonics.com/ >>>>>>> USA: +1 213 221 1092 >>>>>>> New Zealand: +64 (0)28 2558 3782 >>>>>>> ________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> Community Meeting Calendar: >>>>>>> >>>>>>> Schedule - >>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>>>> Bridge: https://meet.google.com/cpu-eiue-hvk >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users@gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> David Cunningham, Voisonics Limited >>>>> http://voisonics.com/ >>>>> USA: +1 213 221 1092 >>>>> New Zealand: +64 (0)28 2558 3782 >>>>> ________ >>>>> >>>>> >>>>> >>>>> Community Meeting Calendar: >>>>> >>>>> Schedule - >>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>> Bridge: https://meet.google.com/cpu-eiue-hvk >>>>> Gluster-users mailing list >>>>> Gluster-users@gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users