Hello, First of all, the problem at hand is not that the mechanism doesn't work, it is the fact that NFS file transfer takes too long. From what I see, the NFS mechanism has worked at least partly.
The NFS was correctly mounted and the coredump transfer was initiated. For some reason, the NFS service started to timeout, but kdump-tools doesn't have much to do with it. One thing did get my attention. The mount command that you issued returns the following (edited for clarity ): # mount /dev/sda2 on / type ext4 (rw,errors=remount-ro) proc on /proc type proc (rw,nodev,noexec,nosuid) ... 9.3.189.84:/nfsshare on /nfsmount type nfs (rw,vers=4,addr=9.3.189.84,clientaddr=9.114.13.128) The NFS mount on /var/crash is not appearing which is definitively a problem as this is done at a very early stage of the process. And it was mounted at the beginning since there is a vmcore-incomplete file on the remote NFS server. I don't have any context on the size of the file to be transfered and maybe it did bring the kexec booted kernel to memory exhaustion but there is no sign of OOM which is to be expected in these situations. Right now, with the data at hand, I cannot put forward anything else than an lack of availability of the NFS server that caused the failure. ** Tags added: cts -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1423483 Title: Kdump over network(nfs) does not work Status in makedumpfile package in Ubuntu: Triaged Bug description: Problem Description ========================== Kdump over network(nfs) does not work ---uname output--- 3.18.0-13-generic Machine Type = POWER8 System Hang ===================== The dump process seems to take a lot of time and it takes forever to save the dump. I waited for almost 3 hours, but the dump did not complete. Steps to Reproduce =========================== 1) Configure kdump over nfs Add the following line to /etc/default/kdump-tools NFS="9.3.189.84:/nfsshare" 2) Load kdump root@lop824:~# kdump-config load Modified cmdline:BOOT_IMAGE=/boot/vmlinux-3.18.0-13-generic root=UUID=234c5426-796e-4f54-bd77-7b0fe10e0407 ro splash irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service elfcorehdr=155072K segment[0].mem:0x8000000 memsz:24510464 segment[1].mem:0x9760000 memsz:65536 segment[2].mem:0x9770000 memsz:65536 segment[3].mem:0x9780000 memsz:65536 segment[4].mem:0x9790000 memsz:21954560 segment[5].mem:0xec70000 memsz:196608 * loaded kdump kernel 3) Trigger a dump. Kdump boot and starts copying the dump but hangs midway. root@lop824:~# ls -lh /nfsmount/9.114.13.128-201502170326/ total 1.3M -rw------- 1 nobody nogroup 27M Feb 17 03:27 dump-incomplete root@lop824:~# root@lop824:~# kdump-config show USE_KDUMP: 1 KDUMP_SYSCTL: kernel.panic_on_oops=1 KDUMP_COREDIR: /var/crash crashkernel addr: NFS: 9.3.189.84:/nfsshare HOSTTAG: ip current state: ready to kdump kexec command: /sbin/kexec -p --args-linux --command-line="BOOT_IMAGE=/boot/vmlinux-3.18.0-13-generic root=UUID=234c5426-796e-4f54-bd77-7b0fe10e0407 ro splash irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/boot/initrd.img-3.18.0-13-generic /boot/vmlinux-3.18.0-13-generic root@lop824:~# == Comment: #3 - SACHIN P. SANT <[email protected]> - 2015-02-17 07:17:14 == Following messages are seen while saving a dump [ 31.059522] NFS: Registering the id_resolver key type [ 31.059542] Key type id_resolver registered [ 31.059544] Key type id_legacy registered [ 36.021996] nfs: server 9.3.189.84 not responding, timed out [ 36.022026] nfs: server 9.3.189.84 not responding, timed out [ 36.022049] nfs: server 9.3.189.84 not responding, timed out [ 40.530000] nfs: server 9.3.189.84 not responding, timed out [ 40.530033] nfs: server 9.3.189.84 not responding, timed out [ 45.037994] nfs: server 9.3.189.84 not responding, timed out [ 45.038020] nfs: server 9.3.189.84 not responding, timed out [ 48.550133] nfs: server 9.3.189.84 not responding, timed out [ 48.550161] nfs: server 9.3.189.84 not responding, timed out [ 51.557995] nfs: server 9.3.189.84 not responding, timed out [ 51.558021] nfs: server 9.3.189.84 not responding, timed out [ 55.617018] nfs: server 9.3.189.84 not responding, timed out [ 55.617050] nfs: server 9.3.189.84 not responding, timed out [ 58.621419] nfs: server 9.3.189.84 not responding, timed out [ 58.621447] nfs: server 9.3.189.84 not responding, timed out [ 58.621470] nfs: server 9.3.189.84 not responding, timed out [ 61.413753] BUG: arch topology borken [ 61.413757] the DIE domain not a subset of the NUMA domain [ 61.413760] BUG: arch topology borken [ 61.413762] the DIE domain not a subset of the NUMA domain [ 61.413765] BUG: arch topology borken [ 61.413766] the DIE domain not a subset of the NUMA domain [ 61.413769] BUG: arch topology borken [ 61.413770] the DIE domain not a subset of the NUMA domain [ 61.413773] BUG: arch topology borken [ 61.413774] the DIE domain not a subset of the NUMA domain [ 61.413777] BUG: arch topology borken [ 61.413778] the DIE domain not a subset of the NUMA domain [ 61.413781] BUG: arch topology borken [ 61.413782] the DIE domain not a subset of the NUMA domain [ 61.413785] BUG: arch topology borken [ 61.413786] the DIE domain not a subset of the NUMA domain [ 61.625436] nfs: server 9.3.189.84 not responding, timed out [ 66.133424] nfs: server 9.3.189.84 not responding, timed out [ 66.133453] nfs: server 9.3.189.84 not responding, timed out [ 70.641436] nfs: server 9.3.189.84 not responding, timed out [ 70.641465] nfs: server 9.3.189.84 not responding, timed out [ 74.149421] nfs: server 9.3.189.84 not responding, timed out [ 74.149452] nfs: server 9.3.189.84 not responding, timed out [ 78.209471] nfs: server 9.3.189.84 not responding, timed out [ 78.209498] nfs: server 9.3.189.84 not responding, timed out [ 81.629433] nfs: server 9.3.189.84 not responding, timed out [ 81.629442] nfs: server 9.3.189.84 not responding, timed out [ 84.633433] nfs: server 9.3.189.84 not responding, timed out [ 87.637419] nfs: server 9.3.189.84 not responding, timed out [ 90.649450] nfs: server 9.3.189.84 not responding, timed out [ 93.653426] nfs: server 9.3.189.84 not responding, timed out [ 95.005433] nfs: server 9.3.189.84 not responding, timed out [ 96.653426] nfs: server 9.3.189.84 not responding, timed out [ 98.009437] nfs: server 9.3.189.84 not responding, timed out I can manually mount the nfs share manually (while the dump is in progress) root@lop824:~# mount -t nfs 9.3.189.84:/nfsshare /nfsmount/ root@lop824:~# mount /dev/sda2 on / type ext4 (rw,errors=remount-ro) proc on /proc type proc (rw,nodev,noexec,nosuid) sysfs on /sys type sysfs (rw,nodev,noexec,nosuid) none on /sys/fs/cgroup type tmpfs (rw,uid=0,gid=0,mode=0755,size=1024) none on /sys/fs/fuse/connections type fusectl (rw) none on /sys/kernel/debug type debugfs (rw) none on /sys/kernel/security type securityfs (rw) udev on /dev type devtmpfs (rw,mode=0755) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620) tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) none on /run/lock type tmpfs (rw,nodev,noexec,nosuid,size=5242880) none on /run/shm type tmpfs (rw,nosuid,nodev) none on /run/user type tmpfs (rw,nodev,noexec,nosuid,size=104857600,mode=0755) none on /sys/fs/pstore type pstore (rw) cgmfs on /run/cgmanager/fs type tmpfs (rw,relatime,size=128k,mode=755) rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw) 9.3.189.84:/nfsshare on /nfsmount type nfs (rw,vers=4,addr=9.3.189.84,clientaddr=9.114.13.128) root@lop824:~# ls root@lop824:~# ls /nfsmount/ 9.114.13.128-201502170326 test root@lop824:~# ls /nfsmount/9.114.13.128-201502170326/ dump-incomplete root@lop824:~# To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1423483/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

