Hello Matthijs Unfortunately the best way to make this not to happen is by fixing the kernel hang situation, when kernel calls sd_sync_cache() to every configured device before the shutdown. There is a single I/O cmd hanging in all scsi paths and the I/O error is never propagated to block layer (despite iscsi having proper I/O error settings). I'm finishing analysing some kernel dumps so I can finally understand what is happening in the transport layer (this happens with more recent kernels also).
The workaround was to create a script that would restore the iscsi connection, wait for the login to happen again and the paths are back online, and cleanly logout, allowing the sd_sync_cache() operation to be finalized. If you are facing this problem, I know for sure that your iscsi connections are not being finalized before the network is off. This means that you have to pay attention on how you configured your iscsi disks: - guarantee that iscsiadm was configured with "interfaces" so it works on startup: sudo iscsiadm -m iface -I ens4 --op=new -n iface.hwaddress -v 52:54:00:b4:21:bb sudo iscsiadm -m iface -I ens7 --op=new -n iface.hwaddress -v 52:54:00:c2:34:1b - the discovery/login has to be made AFTER the iscsiadm had interfaces added sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER1 sudo iscsiadm -m discovery --op=new --op=del --type sendtargets --portal $SERVER2 # iscsiadm -m node --loginall=automatic HAS TO WORK or else init scripts will fail http://pastebin.ubuntu.com/25894472/ - configure the volumes in /etc/fstab with "_netdev" parameter for systemd unit ordering LABEL=BLUE /blue ext4 defaults,_netdev 0 1 LABEL=GREEN /green ext4 defaults,_netdev 0 1 LABEL=PURPLE /purple ext4 defaults,_netdev 0 1 LABEL=RED /red ext4 defaults,_netdev 0 1 LABEL=YELLOW /yellow ext4 defaults,_netdev 0 1 You have to make sure open-iscsi and iscsid systemd units are started after the network is available and are stopped before they disappear. That might be your problem, if configuration above is correct. inaddy@iscsihang:~$ systemctl edit --full iscsid.service inaddy@iscsihang:~$ systemctl edit --full open-iscsi.service The defaults are: [Unit] Description=iSCSI initiator daemon (iscsid) Documentation=man:iscsid(8) Wants=network-online.target remote-fs-pre.target Before=remote-fs-pre.target After=network.target network-online.target and [Unit] Description=Login to default iSCSI targets Documentation=man:iscsiadm(8) man:iscsid(8) Wants=network-online.target remote-fs-pre.target iscsid.service After=network-online.target iscsid.service Before=remote-fs-pre.target So you can see that iscsid.service runs BEFORE open-iscsi.service. In my case, I'm configuring network using rc-local.service (since this is my lab) and I had to guarantee the ordering also: If, after configuring your system like this, you still face problems, you can use this script: http://pastebin.ubuntu.com/25894592/ And provide me the DEBUG=/.shutdown.log file, created after its execution, attached to this launchpad case. Its likely that you will have hang iscsi connections for some reason (services ordering, lack of volumes in fstab so umounts are not done, etc). Hope it helps for now. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1569925 Title: Shutdown hang on 16.04 with iscsi targets Status in linux package in Ubuntu: In Progress Status in open-iscsi package in Ubuntu: In Progress Status in linux source package in Xenial: In Progress Status in open-iscsi source package in Xenial: In Progress Status in linux source package in Zesty: In Progress Status in open-iscsi source package in Zesty: In Progress Status in linux source package in Artful: In Progress Status in open-iscsi source package in Artful: In Progress Bug description: I have 4 servers running the latest 16.04 updates from the development branch (as of right now). Each server is connected to NetApp storage using iscsi software initiator. There are a total of 56 volumes spread across two NetApp arrays. Each volume has 4 paths available to it which are being managed by device mapper. While logged into the iscsi sessions all I have to do is reboot the server and I get a hang. I see a message that says: "Reached target Shutdown" followed by "systemd-shutdown[1]: Failed to finalize DM devices, ignoring" and then I see 8 lines that say: "connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***" "connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***" "connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***" "connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***" "connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***" "connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***" "connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***" "connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***" NOTE: the actual values of the *'s differ for each line above. This seems like a bug somewhere but I am unaware of any additional logging that I could turn on to pinpoint the problem. Note I also have similar setups that are not doing iscsi and they don't have this problem. Here is a screenshot of what I see on the shell when I try to reboot: (https://launchpadlibrarian.net/291303059/Screenshot.jpg) This is being tracked in NetApp bug tracker CQ number 860251. If I log out of all iscsi sessions before rebooting then I do not experience the hang: iscsiadm -m node -U all We are wondering if this could be some kind of shutdown ordering problem. Like the network devices have already disappeared and then iscsi tries to perform some operation (hence the ping timeouts). To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1569925/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

