Bug#1031131: open-iscsi: Lot of iscsi/kernel errors in dmesg with Fujitsu Eternus DX100S4 connected with 2 10Gb ethernet paths with multipathd.

Milan Oravec Wed, 15 Feb 2023 10:24:16 -0800

On Tue, Feb 14, 2023 at 1:04 PM Ritesh Raj Sarraf <r...@debian.org> wrote:


> Hello Migo,
>
>
Hello Ritesh Raj Sarraf,

On Sun, 2023-02-12 at 11:29 +0100, migo wrote:
> > Package: open-iscsi
> > Version: 2.1.3-5
> > Severity: normal
> > X-Debbugs-Cc: migo.ora...@gmail.com
> >
> > Dear Maintainer,
> >
> > I want to apologize if this is not the right place to report this
> > issue, but I need somewhere to start.
> >
> > I've lot of this error messages ind dmesg:
> >
> > Feb 09 10:56:05 virt2n kernel:  connection2:0: detected conn error
> > (1015)
> > Feb 09 10:56:05 virt2n iscsid[2790]: connection2:0 is operational
> > after recovery (1 attempts)
> > Feb 09 10:56:05 virt2n iscsid[2790]: connection1:0 is operational
> > after recovery (1 attempts)
>
> Connection dropped and re-established. Looks normal.
>

Yes I understand that message but WHY this happens, what can be the cause
for this? Link is stable and no errors are reported on ifaces connected to
SAN switch:

ens1f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 10.11.12.17  netmask 255.255.255.0  broadcast 10.11.12.255
        inet6 fe80::9640:c9ff:fe5f:9172  prefixlen 64  scopeid 0x20<link>
        ether 94:40:c9:5f:91:72  txqueuelen 1000  (Ethernet)
        RX packets 184466309231  bytes 146589180968929 (133.3 TiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 86121730517  bytes 284905539017306 (259.1 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 102  memory 0xa5820000-a583ffff

ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 10.11.14.17  netmask 255.255.255.0  broadcast 10.11.14.255
        inet6 fe80::9640:c9ff:fe5f:9173  prefixlen 64  scopeid 0x20<link>
        ether 94:40:c9:5f:91:73  txqueuelen 1000  (Ethernet)
        RX packets 114453605386  bytes 85545283479332 (77.8 TiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 236069263519  bytes 283187128321034 (257.5 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 132  memory 0xa5800000-a581ffff



>
> > Feb 09 10:56:05 virt2n iscsid[2790]: Kernel reported iSCSI connection
> > 1:0 error (1015 - ISCSI_ERR_DATA_DGST: Data digest mismatch) state
> > (3)
> > Feb 09 10:56:05 virt2n iscsid[2790]: Kernel reported iSCSI connection
> > 2:0 error (1015 - ISCSI_ERR_DATA_DGST: Data digest mismatch) state
> > (3)
> > Feb 09 10:56:10 virt2n iscsid[2790]: connection1:0 is operational
> > after recovery (1 attempts)
> > Feb 09 10:56:10 virt2n iscsid[2790]: connection2:0 is operational
> > after recovery (1 attempts)
>
> Conn 2:0 recovered but the earlier error it gave looks misleading to
> me. What target controller you have ? What there a failover across the
> nodes ? How does the target handle the transition period to initiator
> queries ?
>

yes, two paths with failover


>
> > Feb 09 10:56:28 virt2n kernel:  connection1:0: detected conn error
> > (1015)
> > Feb 09 10:56:28 virt2n kernel:  connection2:0: detected conn error
> > (1015)
> > Feb 09 10:56:28 virt2n iscsid[2790]: Kernel reported iSCSI connection
> > 1:0 error (1015 - ISCSI_ERR_DATA_DGST: Data digest mismatch) state
> > (3)
> > Feb 09 10:56:28 virt2n iscsid[2790]: Kernel reported iSCSI connection
> > 2:0 error (1015 - ISCSI_ERR_DATA_DGST: Data digest mismatch) state
> > (3)
> > Feb 09 10:56:33 virt2n kernel:  connection1:0: detected conn error
> > (1015)
> > Feb 09 10:56:33 virt2n kernel:  connection2:0: detected conn error
> > (1015)
> > Feb 09 10:56:33 virt2n iscsid[2790]: connection2:0 is operational
> > after recovery (1 attempts)
> > Feb 09 10:56:33 virt2n iscsid[2790]: connection1:0 is operational
> > after recovery (1 attempts)
>
> Same logs again.
>
> > ... snipped ...
>
> > or enventually this:
> >
> > Feb 10 20:39:39 virt2n kernel:  connection1:0: ping timeout of 5 secs
> > expired, recv timeout 5, last rx 19077132455, last ping 19077133712,
> > now 19077134976
> > Feb 10 20:39:39 virt2n kernel:  connection1:0: detected conn error
> > (1022)
> > Feb 10 20:39:39 virt2n kernel: scsi_io_completion_action: 11
> > callbacks suppressed
> > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:5: [sdl] tag#122 FAILED
> > Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
> > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:5: [sdl] tag#122 CDB: Test
> > Unit Ready 00 00 00 00 00 00
> > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:16: [sdah] tag#103 FAILED
> > Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
> > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:16: [sdah] tag#103 CDB: Test
> > Unit Ready 00 00 00 00 00 00
> > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:24: [sdax] tag#115 FAILED
> > Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
> > Feb 10 20:39:39 virt2n kernel: sd 1:0:0:24: [sdax] tag#115 CDB: Test
> > Unit Ready 00 00 00 00 00 00
> >
> > When this happens iscsi opration is interupted for few seconds,
> > multipath -ll reports (this is only example, i/o pending is reported
> > dor all 30 dm-s:
> >
>
> This all looks normal to me. If you want the errors to be suppressed,
> you can increase the timeouts. On the other hand, otherwise, you could
> also use dm-multipath on top, like you have shown below.
>

What can cause this errors? I've multipath setup, but I want to find root
cause of this error messages.


>
> >
> > web3v (3600000e00d2c0000002c177200120000) dm-52 FUJITSU,ETERNUS_DXL
> > size=500G features='3 queue_if_no_path queue_mode mq' hwhandler='1
> > alua' wp=rw
> > `-+- policy='service-time 0' prio=10 status=active
> >   |- 1:0:0:16 sdah 66:16  active i/o pending running
> >   `- 2:0:0:16 sdai 66:32  active ready running
> > ais_app (3600000e00d2c0000002c1772001f0000) dm-99 FUJITSU,ETERNUS_DXL
> > size=500G features='3 queue_if_no_path queue_mode mq' hwhandler='1
> > alua' wp=rw
> > `-+- policy='service-time 0' prio=10 status=active
> >   |- 1:0:0:28 sdbh 67:176 active i/o pending running
> >   `- 2:0:0:28 sdbf 67:144 active ready running
> >
>
> Looks all correct to me. You already are using feature queue_if_no_path
>
> > SAN consists of one Fujitsu DX100 S4 storage with two controllers
> > connected to LAN switch with two 10Gb fibre links (one from
> > controller), each link has its own VLAN configured. Errors reported
> > ocures on virtualization host that are connected mutipath with two
> > 10Gb fibre links to respective VLANs. Jumbo frames are enabled across
> > the way.
> >
>
> Good. As I expected, you do have a 2 node target controller setup.
>
> > I'll add all neded info upun request.
> >
> > I've consulted this issue with Fujitsu representative and it seems we
> > have all configured right and he advised me to contact Debian
> > support. So I'm here and would kindly ask to point me the right
> > direction.
> >
>
> Okay!! What behavior do you expect ? What anomaly do you see with the
> iSCSI initiator in Debian ?
>

I expect no errors in logs and drop out free communication with target.  I
think this is not normal/standard behaviour.

This is log from past 1h:

[Wed Feb 15 18:17:18 2023]  connection1:0: pdu (op 0x5 itt 0xd0000057)
rejected. Reason code 0x9
[Wed Feb 15 18:42:10 2023]  connection1:0: pdu (op 0x5 itt 0xd000000c)
rejected. Reason code 0x9
[Wed Feb 15 18:47:23 2023]  connection1:0: pdu (op 0x5 itt 0xd000003f)
rejected. Reason code 0x9
[Wed Feb 15 19:05:12 2023]  connection1:0: ping timeout of 5 secs expired,
recv timeout 5, last rx 19183609844, last ping 19183611097, now 19183612352
[Wed Feb 15 19:05:12 2023]  connection1:0: detected conn error (1022)
[Wed Feb 15 19:05:12 2023] sd 1:0:0:27: [sdbe] tag#23 FAILED Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Wed Feb 15 19:05:12 2023] sd 1:0:0:27: [sdbe] tag#23 CDB: Test Unit Ready
00 00 00 00 00 00
[Wed Feb 15 19:05:12 2023] sd 1:0:0:24: [sdax] tag#12 FAILED Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Wed Feb 15 19:05:12 2023] sd 1:0:0:24: [sdax] tag#12 CDB: Test Unit Ready
00 00 00 00 00 00
[Wed Feb 15 19:05:12 2023] sd 1:0:0:0: [sdc] tag#4 FAILED Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Wed Feb 15 19:05:12 2023] sd 1:0:0:0: [sdc] tag#4 CDB: Test Unit Ready 00
00 00 00 00 00
[Wed Feb 15 19:05:12 2023] sd 1:0:0:4: [sdj] tag#66 FAILED Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Wed Feb 15 19:05:12 2023] sd 1:0:0:4: [sdj] tag#66 CDB: Test Unit Ready 00
00 00 00 00 00
[Wed Feb 15 19:05:12 2023] sd 1:0:0:5: [sdl] tag#3 FAILED Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Wed Feb 15 19:05:12 2023] sd 1:0:0:5: [sdl] tag#3 CDB: Test Unit Ready 00
00 00 00 00 00
[Wed Feb 15 19:05:12 2023] sd 1:0:0:16: [sdah] tag#7 FAILED Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Wed Feb 15 19:05:12 2023] sd 1:0:0:16: [sdah] tag#7 CDB: Test Unit Ready
00 00 00 00 00 00
[Wed Feb 15 19:05:12 2023] sd 1:0:0:11: [sdx] tag#5 FAILED Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Wed Feb 15 19:05:12 2023] sd 1:0:0:11: [sdx] tag#5 CDB: Test Unit Ready 00
00 00 00 00 00
[Wed Feb 15 19:05:12 2023] sd 1:0:0:19: [sdan] tag#8 FAILED Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Wed Feb 15 19:05:12 2023] sd 1:0:0:19: [sdan] tag#8 CDB: Test Unit Ready
00 00 00 00 00 00
[Wed Feb 15 19:05:12 2023] sd 1:0:0:3: [sdh] tag#65 FAILED Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Wed Feb 15 19:05:12 2023] sd 1:0:0:3: [sdh] tag#65 CDB: Test Unit Ready 00
00 00 00 00 00
[Wed Feb 15 19:05:12 2023] sd 1:0:0:15: [sdaf] tag#6 FAILED Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Wed Feb 15 19:05:12 2023] sd 1:0:0:15: [sdaf] tag#6 CDB: Test Unit Ready
00 00 00 00 00 00
[Wed Feb 15 19:13:26 2023]  connection1:0: pdu (op 0x5 itt 0xe000005c)
rejected. Reason code 0x9
[Wed Feb 15 19:16:52 2023]  connection1:0: pdu (op 0x5 itt 0xe000003e)
rejected. Reason code 0x9

What does this error mean: connection1:0: pdu (op 0x5 itt 0xe000003e)
rejected. Reason code 0x9

This one repeats very often in logs.

Thank you very much for your assistance!

Kind regards,
Milan

P.S. Sorry for a later response, gmail clasified this email as a SPAM and
I've discovered it only recently. :(




> >
> >
> > -- System Information:
> > Debian Release: 11.6
> >   APT prefers stable-security
> >   APT policy: (500, 'stable-security'), (500, 'stable')
> > Architecture: amd64 (x86_64)
> >
> > Kernel: Linux 5.10.0-21-amd64 (SMP w/256 CPU threads)
> > Kernel taint flags: TAINT_CPU_OUT_OF_SPEC
> > Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8),
> > LANGUAGE=en_US:en
> > Shell: /bin/sh linked to /usr/bin/dash
> > Init: systemd (via /run/systemd/system)
> > LSM: AppArmor: enabled
> >
> > Versions of packages open-iscsi depends on:
> > ii  debconf [debconf-2.0]  1.5.77
> > ii  libc6                  2.31-13+deb11u5
> > ii  libisns0               0.100-3
> > ii  libkmod2               28-1
> > ii  libmount1              2.36.1-8+deb11u1
> > ii  libopeniscsiusr        2.1.3-5
> > ii  libssl1.1              1.1.1n-0+deb11u4
> > ii  libsystemd0            247.3-7+deb11u1
> > ii  udev                   247.3-7+deb11u1
> >
> > open-iscsi recommends no packages.
> >
> > open-iscsi suggests no packages.
> >
> > -- Configuration Files:
> > /etc/iscsi/iscsid.conf [Errno 13] Permission denied:
> > '/etc/iscsi/iscsid.conf'
> >
> > -- debconf information:
> >   open-iscsi/upgrade_even_with_failed_sessions:
> >   open-iscsi/downgrade_and_break_system:
> >   open-iscsi/upgrade_recovery_error:
> >   open-iscsi/remove_even_with_active_sessions:
> >
>
> --
> Ritesh Raj Sarraf | http://people.debian.org/~rrs
> Debian - The Universal Operating System
>

Bug#1031131: open-iscsi: Lot of iscsi/kernel errors in dmesg with Fujitsu Eternus DX100S4 connected with 2 10Gb ethernet paths with multipathd.

Reply via email to