On 06/17/2014 07:43 PM, Hans van Kranenburg wrote:

But I have to leave now, will continue later.

Btw, netapp-linux-community, I kept the Cc in my last update, which is now in a moderation queue of the mailing list. I joined the list, I didn't even know it existed before... Read up at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=740701 when interested.

This evening I tried to reproduce the problem in a test setup, issuing unmap (discard) requests, using fstrim and mkfs in the following situations:

1. Just /dev/sdf, single path to single lun
2. Use multipath to single lun, /dev/mapper/mpatha
3. Use encryption on top of multipath, /dev/mapper/mpatha_luks
4. Use lvm on top of the encryption, /dev/vg_discard/lv_discard
5. Start using a second equally sized lun on the other netapp controller, multipath to it, put encryption on it, pvcreate it, create a new volume group containing both pvs, create a striped lv.

The sad part of the story is that I could not manage to get my iSCSI connections toasted in any of the test cases yet.

For reference, this is what step 5 looks like:

# multipath -l
mpathb (360a9800042576c32412b4532614a6750) dm-2 NETAPP,LUN
size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  |- 36:0:0:1 sdc 8:32  active undef running
  |- 38:0:0:1 sde 8:64  active undef running
  |- 35:0:0:1 sdb 8:16  active undef running
  `- 37:0:0:1 sdd 8:48  active undef running
mpatha (360a9800042577239353f4532614a6339) dm-1 NETAPP,LUN
size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  |- 40:0:0:1 sdf 8:80  active undef running
  |- 42:0:0:1 sdi 8:128 active undef running
  |- 39:0:0:1 sdg 8:96  active undef running
  `- 41:0:0:1 sdh 8:112 active undef running

# cryptsetup --verbose --verify-passphrase luksFormat /dev/mapper/mpatha
# cryptsetup --verbose --verify-passphrase luksFormat /dev/mapper/mpathb

# cryptsetup status /dev/mapper/mpatha_luks
/dev/mapper/mpatha_luks is active.
  type:    LUKS1
  cipher:  aes-cbc-essiv:sha256
  keysize: 256 bits
  device:  /dev/mapper/mpatha
  offset:  4096 sectors
  size:    20967424 sectors
  mode:    read/write

# pvcreate /dev/mapper/mpatha_luks
# pvcreate /dev/mapper/mpathb_luks

# pvs
  PV                      VG         Fmt  Attr PSize  PFree
  /dev/mapper/mpatha_luks vg_discard lvm2 a--  10.00g 9.50g
  /dev/mapper/mpathb_luks vg_discard lvm2 a--  10.00g 9.50g

# vgcreate vg_discard /dev/mapper/mpatha_luks /dev/mapper/mpathb_luks
# lvcreate -i 2 -L 10G -n lv_discard --addtag $(hostname) vg_discard
# mkfs.ext4 /dev/vg_discard/lv_discard

or, do something like:

# dd if=/dev/zero of=sparse bs=1048576 seek=1024 count=0
# shred -n 1 -v sparse
# sync
# rm sparse
# sync
# fstrim -v -o 0MB -l 512MB ./

So, conclusions for now:
- This is not very easily reproducible, it's not just like "you need to have multipath or this and that and then do mkfs or fstrim and then it fails". But it's there, and in the production setup I've seen it happen more than once now, yesterday being the case in which we could connect the dots and pinpoint where the actual problem is. (1st time: what the .. just happened, collect logs, 2nd time: different situation, same cause, compare, etc, blam! it's the unmap iscsi) - I can find very few search hits on this on the web, it does not seem like a known issue, besides the OP of this bug and me reporting it. Google for "CDB: Unmap/Read sub-channel: 42 00 00 00 00 00 00 00 18 00". - There must be something different in the production setup, which is now running separately from this one physical server and test luns, on the exact same type of hardware, using identical software and identical configuration, but fails all I/O after any UNMAP request. Differences are that the production luns are accessed concurrently from multiple physical servers, that there's a lot more I/O going on at any moment, that there's a lot more of logical volumes and data written to the luns etc etc...

Any ideas?

--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | [email protected] | www.mendix.com


--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to