Package: iscsitarget
Version: 1.4.20.2-1
Severity: grave
Justification: renders package unusable
We have a system configured as follows:
1 CPU Xeon E5520 quad core
12 GB RAM
24 SATA 1TB disks connected to a 3ware 9650-24M controller
The system (I will call it the storage server) is used, also, as an iscsi
target for a two node red hat cluster
(implemented with Debian machines): the storage server exports an iscsi lun
that contains a clustered LVM volum group.
The two nodes mounts the iscsi target, then use clvmd of red hat cluster to
manage concurrent access to the logical
volumes on the clustered volume group.
The nodes are used for virtualization with xen with the virtual machines stored
in the shared volume group, this allows
for live migration of the virtual machines from one node to the other, each
node has 32 cores and 32 GB ram.
All the three machines (the storage server and the two nodes) are Debian based:
the iscsi target server was upgraded from lenny to squeeze one week ago, the
two nodes are still on Debian lenny.
Tonight and today we experienced two cases of ietd hang-up: while we were doing
two I/O intensive operations on the storage
server we got the following errors:
FIRST OCCURRENCE:
Jun 5 23:19:19 san00 kernel: [334912.274409] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:562949990973952 (Function Complete)
Jun 5 23:19:20 san00 kernel: [334912.634797] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:1407374921105920 (Function Complete)
Jun 5 23:19:28 san00 kernel: [334920.660528] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:562949990973952 (Function Complete)
Jun 5 23:19:28 san00 kernel: [334920.660811] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:562949990973952 (Function Complete)
Jun 5 23:19:28 san00 kernel: [334920.661105] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:562949990973952 (Function Complete)
Jun 5 23:19:28 san00 kernel: [334920.661357] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:1407374921105920 (Function Complete)
Jun 5 23:19:28 san00 kernel: [334920.661570] iscsi_trgt: Logical Unit Reset
(05) issued on tid:2 lun:0 by sid:562949990973952 (Function Complete)
Jun 5 23:19:58 san00 kernel: [334950.930122] iscsi_trgt: Abort Task (01)
issued on tid:3 lun:0 by sid:844424984461824 (Function Complete)
Jun 5 23:19:59 san00 kernel: [334951.602686] iscsi_trgt: Abort Task (01)
issued on tid:3 lun:0 by sid:1688849914593792 (Function Complete)
...
Jun 5 23:20:31 san00 kernel: [334983.635928] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:1970324874527232 (Function Complete)
Jun 5 23:20:41 san00 kernel: [334993.616383] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:1970324874527232 (Function Complete)
...
Jun 5 23:20:51 san00 kernel: [335003.596884] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:1970324874527232 (Function Complete)
Jun 5 23:21:01 san00 kernel: [335013.577396] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:1970324874527232 (Function Complete)
Jun 5 23:21:01 san00 kernel: [335013.577645] iscsi_trgt: Logical Unit Reset
(05) issued on tid:2 lun:0 by sid:1970324874527232 (Function Complete)
Jun 5 23:21:11 san00 kernel: [335023.557901] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:1970324874527232 (Function Complete)
Jun 5 23:21:13 san00 kernel: [335025.386242] iscsi_trgt: Abort Task (01)
issued on tid:3 lun:0 by sid:2533274844725760 (Function Complete)
...
Jun 5 23:24:21 san00 kernel: [335213.072102] iscsi_trgt: Abort Task (01)
issued on tid:3 lun:0 by sid:2814749821436416 (Function Complete)
Jun 5 23:24:21 san00 kernel: [335213.075205] iscsi_trgt: Abort Task (01)
issued on tid:3 lun:0 by sid:3659174751568384 (Function Complete)
Jun 5 23:24:32 san00 kernel: [335224.457895] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:24:42 san00 kernel: [335234.437754] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:24:42 san00 kernel: [335234.437873] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:24:52 san00 kernel: [335244.418062] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:24:52 san00 kernel: [335244.418174] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:24:54 san00 kernel: [335246.525572] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3377699758080512 (Function Complete)
Jun 5 23:25:02 san00 kernel: [335254.398446] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:25:02 san00 kernel: [335254.398685] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:25:04 san00 kernel: [335256.506021] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3377699758080512 (Function Complete)
Jun 5 23:25:04 san00 kernel: [335256.506153] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3377699758080512 (Function Complete)
Jun 5 23:25:12 san00 kernel: [335264.378719] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:25:12 san00 kernel: [335264.378939] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:25:14 san00 kernel: [335266.486590] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3377699758080512 (Function Complete)
Jun 5 23:25:14 san00 kernel: [335266.486822] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3377699758080512 (Function Complete)
Jun 5 23:25:22 san00 kernel: [335274.357588] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:25:22 san00 kernel: [335274.357798] iscsi_trgt: Logical Unit Reset
(05) issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:25:24 san00 kernel: [335276.465205] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3377699758080512 (Function Complete)
Jun 5 23:25:24 san00 kernel: [335276.465434] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3377699758080512 (Function Complete)
Jun 5 23:25:32 san00 kernel: [335284.339163] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3096224781369856 (Function Complete)
Jun 5 23:25:48 san00 kernel: [335300.095545] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:3377699758080512 (Function Complete)
Jun 5 23:26:02 san00 kernel: [335314.469995] iscsi_trgt: Logical Unit Reset
(05) issued on tid:2 lun:0 by sid:4785074641633792 (Function Complete)
Jun 5 23:26:23 san00 kernel: [335335.475744] iscsi_trgt: Abort Task (01)
issued on tid:3 lun:0 by sid:3940649728279040 (Function Complete)
Jun 5 23:26:24 san00 kernel: [335336.176770] iscsi_trgt: Abort Task (01)
issued on tid:3 lun:0 by sid:4503599681700352 (Function Complete)
Jun 5 23:26:37 san00 kernel: [335349.087260] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:4222124688212480 (Function Complete)
Jun 5 23:26:37 san00 kernel: [335349.087589] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:4222124688212480 (Function Complete)
Jun 5 23:26:37 san00 kernel: [335349.087934] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:4222124688212480 (Function Complete)
Jun 5 23:26:43 san00 kernel: [335355.121186] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:4785074641633792 (Function Complete)
Jun 5 23:26:43 san00 kernel: [335355.121609] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:4785074641633792 (Function Complete)
SECOND OCCURRENCE:
Jun 6 13:31:29 san00 kernel: [385942.228071] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:6473924501897728 (Function Complete)
Jun 6 13:31:29 san00 kernel: [385942.228256] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:7318349432029696 (Function Complete)
Jun 6 13:31:29 san00 kernel: [385942.228604] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:7318349432029696 (Function Complete)
Jun 6 13:31:29 san00 kernel: [385942.228903] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:7318349432029696 (Function Complete)
Jun 6 13:31:29 san00 kernel: [385942.229276] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:7318349432029696 (Function Complete)
Jun 6 13:31:29 san00 kernel: [385942.229600] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:7318349432029696 (Function Complete)
Jun 6 13:31:29 san00 kernel: [385942.229948] iscsi_trgt: Abort Task (01)
issued on tid:2 lun:0 by sid:7318349432029696 (Function Complete)
On the first occurrence I was fscking and resizing a file system, on the second
one I was dd-ing a 30G file on a logical volume.
These operations were performed directly on the storage server on a portion of
disk not exported by ietd.
I suspect the problem to be related to the squeeze upgrade, because before
upgrading I never saw such an error.
We have 30+ vms in production on the cluster so this thing is highly
disruptive: every time it happens all the virtual machines hang up,
the file systems get corrupted, we have to restart the cluster, manually fsck
every file system and restart the whole thing...
I have a test cluster with a smaller disk system (8 disks connected to a 3ware
controller), with debian squeeze installed and the same version
of iscsistarget installed (1.4.20.2-1) and on this system the problem was never
seen, but in this case we have not 30 virtual machines working
through iscsi on the system.
Tomorrow I will try to reproduce the problem on the test system by loading it
with some test machines, at least to understand if the problem
arises above a threshold of iscsi workload combined with an i/o intensive
operation on the storage server.
What can we look at on the production system? Could this be a bug? or a
misconfiguration on our side?
Here is /etc/iet/ietd.conf
Target iqn.2002-02.it.midhgard:san00.pxeroot
IncomingUser iscsiuser iscsipassword
OutgoingUser
Lun 0 Path=/dev/vg00/pxeroot,Type=blockio
InitialR2T No
ImmediateData Yes
MaxRecvDataSegmentLength 8192
MaxXmitDataSegmentLength 65536
MaxBurstLength 262144
FirstBurstLength 65536
MaxOutstandingR2T 8
HeaderDigest None
DataDigest None
Wthreads 8
QueuedCommands 64
Target iqn.2002-02.it.midhgard:san00.lun0
IncomingUser iscsiuser iscsipassword
OutgoingUser
Lun 0 Path=/dev/vg00/lun0,Type=blockio
InitialR2T No
ImmediateData Yes
MaxRecvDataSegmentLength 8192
MaxXmitDataSegmentLength 65536
MaxBurstLength 262144
FirstBurstLength 65536
MaxOutstandingR2T 8
HeaderDigest None
DataDigest None
Wthreads 8
QueuedCommands 64
Target iqn.2002-02.it.midhgard:san00.qdisk
IncomingUser iscsiuser iscsipassword
OutgoingUser
Lun 0 Path=/dev/vg00/qdisk,Type=blockio
InitialR2T No
ImmediateData Yes
MaxRecvDataSegmentLength 8192
MaxXmitDataSegmentLength 65536
MaxBurstLength 262144
FirstBurstLength 65536
MaxOutstandingR2T 8
HeaderDigest None
DataDigest None
Wthreads 8
QueuedCommands 64
The first target is not used anymore, the second is the one containing the
clustered vg, the third one is used as quorum disk by red hat cluster.
Any help is appreciated!
Best regards
Massimiliano Ferrero
-- System Information:
Debian Release: 6.0.1
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.32-5-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages iscsitarget depends on:
ii libc6 2.11.2-10 Embedded GNU C Library: Shared lib
ii lsb-base 3.2-23.2squeeze1 Linux Standard Base 3.2 init scrip
ii procps 1:3.2.8-9 /proc file system utilities
Versions of packages iscsitarget recommends:
pn iscsitarget-module <none> (no description available)
Versions of packages iscsitarget suggests:
ii iscsitarget-dkms 1.4.20.2-1 iSCSI Enterprise Target kernel mod
pn iscsitarget-source <none> (no description available)
-- Configuration Files:
/etc/default/iscsitarget changed:
ISCSITARGET_ENABLE=true
/etc/iet/ietd.conf [Errno 13] Permission denied: u'/etc/iet/ietd.conf'
/etc/iet/initiators.allow changed:
ALL 127.0.0.1, 10.212.0.0/16, 10.213.0.0/16
/etc/ietd.conf [Errno 2] No such file or directory: u'/etc/ietd.conf'
/etc/initiators.allow [Errno 2] No such file or directory:
u'/etc/initiators.allow'
/etc/initiators.deny [Errno 2] No such file or directory:
u'/etc/initiators.deny'
-- no debconf information
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]