On Tue, Feb 14, 2023 at 1:04 PM Ritesh Raj Sarraf <r...@debian.org> wrote:
> Hello Migo, > > Hello Ritesh Raj Sarraf, On Sun, 2023-02-12 at 11:29 +0100, migo wrote: > > Package: open-iscsi > > Version: 2.1.3-5 > > Severity: normal > > X-Debbugs-Cc: migo.ora...@gmail.com > > > > Dear Maintainer, > > > > I want to apologize if this is not the right place to report this > > issue, but I need somewhere to start. > > > > I've lot of this error messages ind dmesg: > > > > Feb 09 10:56:05 virt2n kernel: connection2:0: detected conn error > > (1015) > > Feb 09 10:56:05 virt2n iscsid[2790]: connection2:0 is operational > > after recovery (1 attempts) > > Feb 09 10:56:05 virt2n iscsid[2790]: connection1:0 is operational > > after recovery (1 attempts) > > Connection dropped and re-established. Looks normal. > Yes I understand that message but WHY this happens, what can be the cause for this? Link is stable and no errors are reported on ifaces connected to SAN switch: ens1f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000 inet 10.11.12.17 netmask 255.255.255.0 broadcast 10.11.12.255 inet6 fe80::9640:c9ff:fe5f:9172 prefixlen 64 scopeid 0x20<link> ether 94:40:c9:5f:91:72 txqueuelen 1000 (Ethernet) RX packets 184466309231 bytes 146589180968929 (133.3 TiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 86121730517 bytes 284905539017306 (259.1 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 102 memory 0xa5820000-a583ffff ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000 inet 10.11.14.17 netmask 255.255.255.0 broadcast 10.11.14.255 inet6 fe80::9640:c9ff:fe5f:9173 prefixlen 64 scopeid 0x20<link> ether 94:40:c9:5f:91:73 txqueuelen 1000 (Ethernet) RX packets 114453605386 bytes 85545283479332 (77.8 TiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 236069263519 bytes 283187128321034 (257.5 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 132 memory 0xa5800000-a581ffff > > > Feb 09 10:56:05 virt2n iscsid[2790]: Kernel reported iSCSI connection > > 1:0 error (1015 - ISCSI_ERR_DATA_DGST: Data digest mismatch) state > > (3) > > Feb 09 10:56:05 virt2n iscsid[2790]: Kernel reported iSCSI connection > > 2:0 error (1015 - ISCSI_ERR_DATA_DGST: Data digest mismatch) state > > (3) > > Feb 09 10:56:10 virt2n iscsid[2790]: connection1:0 is operational > > after recovery (1 attempts) > > Feb 09 10:56:10 virt2n iscsid[2790]: connection2:0 is operational > > after recovery (1 attempts) > > Conn 2:0 recovered but the earlier error it gave looks misleading to > me. What target controller you have ? What there a failover across the > nodes ? How does the target handle the transition period to initiator > queries ? > yes, two paths with failover > > > Feb 09 10:56:28 virt2n kernel: connection1:0: detected conn error > > (1015) > > Feb 09 10:56:28 virt2n kernel: connection2:0: detected conn error > > (1015) > > Feb 09 10:56:28 virt2n iscsid[2790]: Kernel reported iSCSI connection > > 1:0 error (1015 - ISCSI_ERR_DATA_DGST: Data digest mismatch) state > > (3) > > Feb 09 10:56:28 virt2n iscsid[2790]: Kernel reported iSCSI connection > > 2:0 error (1015 - ISCSI_ERR_DATA_DGST: Data digest mismatch) state > > (3) > > Feb 09 10:56:33 virt2n kernel: connection1:0: detected conn error > > (1015) > > Feb 09 10:56:33 virt2n kernel: connection2:0: detected conn error > > (1015) > > Feb 09 10:56:33 virt2n iscsid[2790]: connection2:0 is operational > > after recovery (1 attempts) > > Feb 09 10:56:33 virt2n iscsid[2790]: connection1:0 is operational > > after recovery (1 attempts) > > Same logs again. > > > ... snipped ... > > > or enventually this: > > > > Feb 10 20:39:39 virt2n kernel: connection1:0: ping timeout of 5 secs > > expired, recv timeout 5, last rx 19077132455, last ping 19077133712, > > now 19077134976 > > Feb 10 20:39:39 virt2n kernel: connection1:0: detected conn error > > (1022) > > Feb 10 20:39:39 virt2n kernel: scsi_io_completion_action: 11 > > callbacks suppressed > > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:5: [sdl] tag#122 FAILED > > Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK > > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:5: [sdl] tag#122 CDB: Test > > Unit Ready 00 00 00 00 00 00 > > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:16: [sdah] tag#103 FAILED > > Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK > > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:16: [sdah] tag#103 CDB: Test > > Unit Ready 00 00 00 00 00 00 > > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:24: [sdax] tag#115 FAILED > > Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK > > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:24: [sdax] tag#115 CDB: Test > > Unit Ready 00 00 00 00 00 00 > > > > When this happens iscsi opration is interupted for few seconds, > > multipath -ll reports (this is only example, i/o pending is reported > > dor all 30 dm-s: > > > > This all looks normal to me. If you want the errors to be suppressed, > you can increase the timeouts. On the other hand, otherwise, you could > also use dm-multipath on top, like you have shown below. > What can cause this errors? I've multipath setup, but I want to find root cause of this error messages. > > > > > web3v (3600000e00d2c0000002c177200120000) dm-52 FUJITSU,ETERNUS_DXL > > size=500G features='3 queue_if_no_path queue_mode mq' hwhandler='1 > > alua' wp=rw > > `-+- policy='service-time 0' prio=10 status=active > > |- 1:0:0:16 sdah 66:16 active i/o pending running > > `- 2:0:0:16 sdai 66:32 active ready running > > ais_app (3600000e00d2c0000002c1772001f0000) dm-99 FUJITSU,ETERNUS_DXL > > size=500G features='3 queue_if_no_path queue_mode mq' hwhandler='1 > > alua' wp=rw > > `-+- policy='service-time 0' prio=10 status=active > > |- 1:0:0:28 sdbh 67:176 active i/o pending running > > `- 2:0:0:28 sdbf 67:144 active ready running > > > > Looks all correct to me. You already are using feature queue_if_no_path > > > SAN consists of one Fujitsu DX100 S4 storage with two controllers > > connected to LAN switch with two 10Gb fibre links (one from > > controller), each link has its own VLAN configured. Errors reported > > ocures on virtualization host that are connected mutipath with two > > 10Gb fibre links to respective VLANs. Jumbo frames are enabled across > > the way. > > > > Good. As I expected, you do have a 2 node target controller setup. > > > I'll add all neded info upun request. > > > > I've consulted this issue with Fujitsu representative and it seems we > > have all configured right and he advised me to contact Debian > > support. So I'm here and would kindly ask to point me the right > > direction. > > > > Okay!! What behavior do you expect ? What anomaly do you see with the > iSCSI initiator in Debian ? > I expect no errors in logs and drop out free communication with target. I think this is not normal/standard behaviour. This is log from past 1h: [Wed Feb 15 18:17:18 2023] connection1:0: pdu (op 0x5 itt 0xd0000057) rejected. Reason code 0x9 [Wed Feb 15 18:42:10 2023] connection1:0: pdu (op 0x5 itt 0xd000000c) rejected. Reason code 0x9 [Wed Feb 15 18:47:23 2023] connection1:0: pdu (op 0x5 itt 0xd000003f) rejected. Reason code 0x9 [Wed Feb 15 19:05:12 2023] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 19183609844, last ping 19183611097, now 19183612352 [Wed Feb 15 19:05:12 2023] connection1:0: detected conn error (1022) [Wed Feb 15 19:05:12 2023] sd 1:0:0:27: [sdbe] tag#23 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK [Wed Feb 15 19:05:12 2023] sd 1:0:0:27: [sdbe] tag#23 CDB: Test Unit Ready 00 00 00 00 00 00 [Wed Feb 15 19:05:12 2023] sd 1:0:0:24: [sdax] tag#12 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK [Wed Feb 15 19:05:12 2023] sd 1:0:0:24: [sdax] tag#12 CDB: Test Unit Ready 00 00 00 00 00 00 [Wed Feb 15 19:05:12 2023] sd 1:0:0:0: [sdc] tag#4 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK [Wed Feb 15 19:05:12 2023] sd 1:0:0:0: [sdc] tag#4 CDB: Test Unit Ready 00 00 00 00 00 00 [Wed Feb 15 19:05:12 2023] sd 1:0:0:4: [sdj] tag#66 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK [Wed Feb 15 19:05:12 2023] sd 1:0:0:4: [sdj] tag#66 CDB: Test Unit Ready 00 00 00 00 00 00 [Wed Feb 15 19:05:12 2023] sd 1:0:0:5: [sdl] tag#3 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK [Wed Feb 15 19:05:12 2023] sd 1:0:0:5: [sdl] tag#3 CDB: Test Unit Ready 00 00 00 00 00 00 [Wed Feb 15 19:05:12 2023] sd 1:0:0:16: [sdah] tag#7 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK [Wed Feb 15 19:05:12 2023] sd 1:0:0:16: [sdah] tag#7 CDB: Test Unit Ready 00 00 00 00 00 00 [Wed Feb 15 19:05:12 2023] sd 1:0:0:11: [sdx] tag#5 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK [Wed Feb 15 19:05:12 2023] sd 1:0:0:11: [sdx] tag#5 CDB: Test Unit Ready 00 00 00 00 00 00 [Wed Feb 15 19:05:12 2023] sd 1:0:0:19: [sdan] tag#8 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK [Wed Feb 15 19:05:12 2023] sd 1:0:0:19: [sdan] tag#8 CDB: Test Unit Ready 00 00 00 00 00 00 [Wed Feb 15 19:05:12 2023] sd 1:0:0:3: [sdh] tag#65 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK [Wed Feb 15 19:05:12 2023] sd 1:0:0:3: [sdh] tag#65 CDB: Test Unit Ready 00 00 00 00 00 00 [Wed Feb 15 19:05:12 2023] sd 1:0:0:15: [sdaf] tag#6 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK [Wed Feb 15 19:05:12 2023] sd 1:0:0:15: [sdaf] tag#6 CDB: Test Unit Ready 00 00 00 00 00 00 [Wed Feb 15 19:13:26 2023] connection1:0: pdu (op 0x5 itt 0xe000005c) rejected. Reason code 0x9 [Wed Feb 15 19:16:52 2023] connection1:0: pdu (op 0x5 itt 0xe000003e) rejected. Reason code 0x9 What does this error mean: connection1:0: pdu (op 0x5 itt 0xe000003e) rejected. Reason code 0x9 This one repeats very often in logs. Thank you very much for your assistance! Kind regards, Milan P.S. Sorry for a later response, gmail clasified this email as a SPAM and I've discovered it only recently. :( > > > > > > -- System Information: > > Debian Release: 11.6 > > APT prefers stable-security > > APT policy: (500, 'stable-security'), (500, 'stable') > > Architecture: amd64 (x86_64) > > > > Kernel: Linux 5.10.0-21-amd64 (SMP w/256 CPU threads) > > Kernel taint flags: TAINT_CPU_OUT_OF_SPEC > > Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), > > LANGUAGE=en_US:en > > Shell: /bin/sh linked to /usr/bin/dash > > Init: systemd (via /run/systemd/system) > > LSM: AppArmor: enabled > > > > Versions of packages open-iscsi depends on: > > ii debconf [debconf-2.0] 1.5.77 > > ii libc6 2.31-13+deb11u5 > > ii libisns0 0.100-3 > > ii libkmod2 28-1 > > ii libmount1 2.36.1-8+deb11u1 > > ii libopeniscsiusr 2.1.3-5 > > ii libssl1.1 1.1.1n-0+deb11u4 > > ii libsystemd0 247.3-7+deb11u1 > > ii udev 247.3-7+deb11u1 > > > > open-iscsi recommends no packages. > > > > open-iscsi suggests no packages. > > > > -- Configuration Files: > > /etc/iscsi/iscsid.conf [Errno 13] Permission denied: > > '/etc/iscsi/iscsid.conf' > > > > -- debconf information: > > open-iscsi/upgrade_even_with_failed_sessions: > > open-iscsi/downgrade_and_break_system: > > open-iscsi/upgrade_recovery_error: > > open-iscsi/remove_even_with_active_sessions: > > > > -- > Ritesh Raj Sarraf | http://people.debian.org/~rrs > Debian - The Universal Operating System >