------- Comment From jan.hoepp...@de.ibm.com 2020-10-15 13:47 EDT-------
(In reply to comment #34)
> So I took the time to re-test this again.
> My z/VM guest has 4 CPUs (but SMT on), and 4 DASD FBA devices that equally
> split a 64GB zFCP/SCSI LUN in 4 16GB FBA chunks.
>
> I've tested (in comment #8) with 2GB RAM where things worked and I wasn't
> able to recreate the error situation.
> I then moved to 6GB RAM and things still worked for me.
> Then 8GB - where everything was still fine.
> And finally 10GB - still don't see the issue.
>
> $ grep -i 'error\|crash\|crit\|panic\|I\/O\|erp\|sense\|fba' /var/log/syslog
> ul 28 10:05:23 hwe0005 systemd[1]: Stopping LSB: automatic crash report
> generation...
> Jul 28 10:05:23 hwe0005 systemd[1]: Stopping Configure dump on panic for
> System z...
> Jul 28 10:07:36 hwe0005 systemd-udevd[514]: dasd-fba:
> /etc/udev/rules.d/41-generic-ccw-0.0.0009.rules:7 Failed to write
> ATTR{/sys/devices/css0/0.0.0007/0.0.0009/online}, ignoring: Invalid argument
> Jul 28 10:07:36 hwe0005 systemd-udevd[511]: 0.0.0102:
> /etc/udev/rules.d/41-dasd-fba-0.0.0102.rules:7 Failed to write
> ATTR{/sys/devices/css0/0.0.0001/0.0.0102/online}, ignoring: Invalid argument
> Jul 28 10:07:36 hwe0005 systemd-udevd[522]: 0.0.0101:
> /etc/udev/rules.d/41-dasd-fba-0.0.0101.rules:7 Failed to write
> ATTR{/sys/devices/css0/0.0.0000/0.0.0101/online}, ignoring: Invalid argument
> Jul 28 10:07:36 hwe0005 systemd-udevd[522]: 0.0.0103:
> /etc/udev/rules.d/41-dasd-fba-0.0.0103.rules:7 Failed to write
> ATTR{/sys/devices/css0/0.0.0002/0.0.0103/online}, ignoring: Invalid argument
> Jul 28 10:07:36 hwe0005 systemd-udevd[505]: 0.0.0104:
> /etc/udev/rules.d/41-dasd-fba-0.0.0104.rules:7 Failed to write
> ATTR{/sys/devices/css0/0.0.0003/0.0.0104/online}, ignoring: Invalid argument
> Jul 28 10:07:36 hwe0005 kernel: [    4.983272] dasd-fba.f36f2f: 0.0.0101:
> New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
> Jul 28 10:07:36 hwe0005 kernel: [    4.988020] dasd-fba.f36f2f: 0.0.0102:
> New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
> Jul 28 10:07:36 hwe0005 kernel: [    4.990317] dasd-fba.f36f2f: 0.0.0103:
> New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
> Jul 28 10:07:36 hwe0005 kernel: [    4.992370] dasd-fba.f36f2f: 0.0.0104:
> New FBA DASD 9336/10 (CU 6310/80) with 16384 MB and 512 B/blk
> Jul 28 10:07:36 hwe0005 systemd[1]: Condition check resulted in Process
> error reports when automatic reporting is enabled (file watch) being skipped.
> Jul 28 10:07:36 hwe0005 systemd[1]: Condition check resulted in Unix socket
> for apport crash forwarding being skipped.
> Jul 28 10:07:36 hwe0005 systemd[1]: Starting LSB: automatic crash report
> generation...
> Jul 28 10:07:36 hwe0005 systemd[1]: Starting Configure dump on panic for
> System z...
> Jul 28 10:07:36 hwe0005 apport[764]:  * Starting automatic crash report
> generation: apport
> Jul 28 10:07:36 hwe0005 dumpconf[770]: stop on panic configured.
> Jul 28 10:07:36 hwe0005 systemd[1]: Finished Configure dump on panic for
> System z.
> Jul 28 10:07:36 hwe0005 systemd[1]: Started LSB: automatic crash report
> generation.
>
> I'm wondering a bit about the systemd msgs and the sysfs device tree. But
> other than that no ERP, sense, or panics so far ...
>
> $ dmesg | grep -i 'error\|fail\|crash\|warn\|crit\|panic\|erp\|fba'
> [    4.983272] dasd-fba.f36f2f: 0.0.0101: New FBA DASD 9336/10 (CU 6310/80)
> with 16383 MB and 512 B/blk
> [    4.988020] dasd-fba.f36f2f: 0.0.0102: New FBA DASD 9336/10 (CU 6310/80)
> with 16383 MB and 512 B/blk
> [    4.990317] dasd-fba.f36f2f: 0.0.0103: New FBA DASD 9336/10 (CU 6310/80)
> with 16383 MB and 512 B/blk
> [    4.992370] dasd-fba.f36f2f: 0.0.0104: New FBA DASD 9336/10 (CU 6310/80)
> with 16384 MB and 512 B/blk
> [    5.075981] random: 7 urandom warning(s) missed due to ratelimiting
>
> I always did a quick check of the partition data:
>
> ubuntu@hwe0005:~$ sudo fdisk -l /dev/dasde1
> Disk /dev/dasde1: 15.102 GiB, 17178902528 bytes, 33552544 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
>
> And then created a ext3 file system using -F on all 4 FBA devices one after
> the other:
>
> ubuntu@hwe0005:~$ sudo mkfs.ext3 -F /dev/dasde1
> mke2fs 1.45.5 (07-Jan-2020)
> /dev/dasde1 contains a ext3 file system
> created on Tue Jul 28 09:45:37 2020
> Discarding device blocks: done
> Creating filesystem with 4194068 4k blocks and 1048576 inodes
> Filesystem UUID: c34e7583-1dc9-4b8a-8494-7a100338a7e6
> Superblock backups stored on blocks:
> 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
> 4096000
>
> Allocating group tables: done
> Writing inode tables: done
> Creating journal (16384 blocks): done
> Writing superblocks and filesystem accounting information: done
>
> Does it have a dependency on a certain z/VM version:
>
> And I'm running this z/VM version:
> 00: CP Q CPLEVEL
> 00: z/VM Version 6 Release 4.0, service level 1901 (64-bit)
> 00: Generated at 2019-06-14 14:15:49 UTC
>
> I do the FBA devices always have to be re-enabled before retrying.
>
> Right now I'm a bit lost re-creating this.
>
> @Jan, how did you system and FBAs looked like? And which z/VM version are
> you using?

Hi,
(sorry for this late answer)

this is the setup I've used to recreated the issue:

root@m3529007:~# vmcp q edev 1411 details
EDEV 1411 TYPE FBA ATTRIBUTES SCSI
VENDOR: IBM PRODUCT: 2107900 REVISION: 1060
BLOCKSIZE: 512 NUMBER OF BLOCKS: 41943040
PATHS:
FCP_DEV: 1907 WWPN: 500507630708C5E3 LUN: 4002404B00000000
CONNECTION TYPE: SWITCHED STATUS: ONLINE
EQID: 6005076307FFC5E300000000000002F4C200000000027FFFFF
SERIAL NUMBER: 75DL241024B
root@m3529007:~# free -m
total        used        free      shared  buff/cache   available
Mem:           7629         260        6978           0         390        7257
Swap:             0           0           0

root@m3529007:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/";
SUPPORT_URL="https://help.ubuntu.com/";
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

root@m3529007:~# vmcp q cplevel
z/VM Version 7 Release 1.0, service level 2001 (64-bit)
Generated at 04/14/20 13:25:21 CES
IPL at 04/14/20 13:44:10 CES

root@m3529007:~# uname -r
5.4.0-42-generic

In order to re-create the problem you have to make sure the provided patch here
is not applied (the various kernel versions mentioned here should be sufficient)
and that you have more than 2GB memory available (4 or more GB might be ideal).

I then simply created an EDEV, attached it, and tried to create a filesystem on
it (mkfs.ext4 /dev/dasdb1). This crashes immediately as mkfs.ext4 issues discard
I/O already.

The issue is finally resolved though by this upstream commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=709192d531e5b0a91f20aa14abfe2fc27ddd47af

A similar problem with the same root cause was reported here by the way:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1867118

I hope this clears things up.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1879707

Title:
  [UBUNTU 20.04] mke2fs dasd(fba),Failing CCW,default ERP has run out of
  retries and failed

Status in Ubuntu on IBM z Systems:
  Incomplete
Status in linux package in Ubuntu:
  New

Bug description:
  mke2fs,dasd(fba) guest edevices FBA,default ERP has run out of retries and 
failed,Failing CCW
   
  ---uname output---
  xxxxxx -  5.4.0-29-generic #33-Ubuntu SMP Wed Apr 29 14:27:18 UTC 2020 s390x 
s390x s390x GNU/Linux
   
  Machine Type = IBM 3906 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   mke2fs to dasd(fba) devices
   
  Stack trace output:
   no
   
  Oops output:
   no
   
  System Dump Info:
    The system is not configured to capture a system dump.
   
  -Post a private note with access information to the machine that the bug is 
occuring on. 
  -Attach sysctl -a output output to the bug.

  dasd(fba),Failing CCW,default ERP has run out of retries and failed between 
the following syslog events,
  mke2fs running, before mounting and starting IO to dasd(fba) devices

  May 14 14:33:32 ilabg13 root: ILAB_IO_FROM_MSDI_START
  May 14 14:48:34 ilabg13 root: ILAB_IO_FROM_MSDI_RUNNING

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1879707/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to