------- Comment From heinz-werner_se...@de.ibm.com 2018-12-03 11:05 EDT-------
IBM bugzilla status-> closed, Fix Released with Bionic, Cosmic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1797367

Title:
  Ubuntu 18.04.1 - [s390x] Kernel panic while stressing network bonding

Status in Ubuntu on IBM z Systems:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification ==

  While running a series of stress tests for network on a bond device on Ubuntu 
18.04.1 with kernel 4.15.0-36.39,
  kernel panic is observed (btw. also on non-bond devices).
  This looks like a race between disabling a qeth device and accessing debugfs.
  This is critical and leads repeatedly to a crash (sooner or later).

  == Fix ==

  e19e5be8b4ca ("s390/qeth: sanitize strings in debug messages")

  pre-reqs:
  750b162 ("s390/qeth: reduce hard-coded access to ccw channels")
  d857e11 ("s390/qeth: remove outdated portname debug msg")
  9d0a58f ("s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]")
  8174aa8 ("s390/qeth: consolidate qeth MAC address helpers")
  4641b02 ("s390/qeth: don't keep track of MAC address's cast type")

  == Regression Potential ==

  Low, because:

  - limited to s390x
  - and again limited to qeth driver
  - patches a problem identified during testing
  - fix was tested by IBM before submitted

  == Test Case ==

  run:
     #!/bin/bash
     var=0
     while :
     do
          var=$((var + 1))
          echo "DBG count is $var"
          mkdir /tmp/DBGINFO
          dbginfo.sh -d /tmp/DBGINFO
          rm -rf /tmp/DBGINFO*
          echo "chzdev now is $var"
          chzdev -e <qeth device>
          chzdev -d <qeth device>
     done
  and in avg. in less than 20 cycles a crash happens (usually < 10).

  __________

  == Comment: #0 - Athira Rajeev
  ---Problem Description---
  While running a series of stress tests for network bonding on UBUNTU 18.04.1 
with kernel 4.15.0-36.39, kernel panic is observed.
  There are two instance of panic experienced with the same test procedures one 
of which indicates to be a kernel BUG.

  Contact Information = Athira Rajeev <atraj...@in.ibm.com>, Waiki
  Wright < wa...@us.ibm.com >

  ---uname output---
  #39-Ubuntu SMP Mon Sep 24 16:13:24 UTC 2018 4.15.0-36.39

  Machine Type = This issue is observed on z13 system
   ---Debugger---
  A debugger was configured,

  ---Steps to Reproduce---
  This happens while running stress tests for network bonding. kernel memory 
exposure attempt is detected and the BUG() is called from the code snippet: 
mm/usercopy.c:72
  dump was configured and crash dump is available.
  Results of few crash commands like bt, log are added in Attachment

  Relevant part of dmesg pointing to kernel BUG

  <<>>
  [14746.977364] kernel BUG at 
/build/linux-PABIrW/linux-4.15.0/mm/usercopy.c:72!
  [14746.977377] illegal operation: 0001 ilc:1 [#1] SMP
  [14746.977378] Modules linked in: macsec vsock_diag vsock sctp_diag sctp 
dccp_diag dccp tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag 
netlink_diag bonding binfmt_misc qeth_l3 8021q garp mrp stp llc xt_tcpudp 
qeth_l2 nf_conntrack_ipv6 nf_defrag_ipv6 scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
s390_trng ghash_s390 prng sha512_s390 sha256_s390 sha1_s390 sha_common chsc_sch 
eadm_sch nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c 
crc32_vx_s390 qeth ccwgroup ip6table_filter ip6_tables vfio_ccw vfio_mdev mdev 
vfio_iommu_type1 vfio iptable_filter sch_fq_codel ip_tables x_tables aes_s390 
des_s390 des_generic dm_crypt dm_service_time dm_multipath zfcp 
scsi_transport_fc qdio dasd_eckd_mod dasd_mod btrfs xor zstd_compress raid6_pq 
zlib_deflate
  [14746.977401] CPU: 1 PID: 20905 Comm: dump2tar Tainted: G           OE    
4.15.0-36-generic #39-Ubuntu
  [14746.977403] Hardware name: IBM 3906 M02 757 (LPAR)
  [14746.977404] Krnl PSW : 000000000f2d230d 000000006abe14d5 
(__check_object_size+0x15a/0x1e0)
  [14746.977408]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 
RI:0 EA:3
  [14746.977410] Krnl GPRS: 0000000000000002 0000000000e95334 0000000000000064 
00000001e6518828
  [14746.977412]            000000000037cc8e 0000000000000000 0000000000a9577c 
0000000000000000
  [14746.977413]            000000000000647b 00000001d8c120a8 0000000000000001 
0000000000008088
  [14746.977433]            00000001d8c0a020 000000000090da38 000000000037cc8e 
000000016fdfbcd0
  [14746.977440] Krnl Code: 000000000037cc82: c0200038ef69        larl    
%r2,a9ab54
                            000000000037cc88: c0e5fff32838        brasl   
%r14,1e1cf8
                           #000000000037cc8e: a7f40001            brc     
15,37cc90
                           >000000000037cc92: e330d0080004        lg      
%r3,8(%r13)
                            000000000037cc98: e320d0000004        lg      
%r2,0(%r13)
                            000000000037cc9e: ecc2001a4065        clgrj   
%r12,%r2,4,37ccd2
                            000000000037cca4: b9040013            lgr     
%r1,%r3
                            000000000037cca8: ec31ff868064        cgrj    
%r3,%r1,8,37cbb4
  [14746.977458] Call Trace:
  [14746.977460] ([<000000000037cc8e>] __check_object_size+0x156/0x1e0)
  [14746.977462]  [<000000000010ac40>] debug_output+0x150/0x2f8
  [14746.977464]  [<00000000004e02c0>] full_proxy_read+0x80/0xe0
  [14746.977466]  [<0000000000382592>] vfs_read+0x8a/0x150
  [14746.977467]  [<0000000000382b2e>] SyS_read+0x66/0xe0
  [14746.977469]  [<00000000008e3c94>] system_call+0xd8/0x2c8
  [14746.977470] Last Breaking-Event-Address:
  [14746.977472]  [<000000000037cc8e>] __check_object_size+0x156/0x1e0
  [14746.977473]
  <<>>

  Adding one more occurrence of panic_on_oops below which appears to
  correlate to above .

  Stack trace output:
  Available traces added below

  Oops output:
   [ 2140.467261] 8021q: adding VLAN 0 to HW filter on device bond0
  [ 2140.467979] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
  [ 2140.471609] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
  [ 2140.471610] 8021q: adding VLAN 0 to HW filter on device bond0
  [ 2140.472797] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
  [ 2143.278986] Unable to handle kernel pointer dereference in virtual kernel 
address space
  [ 2143.278991] Failing address: 7379732f6b657000 TEID: 7379732f6b657803
  [ 2143.278993] Fault in home space mode while using kernel ASCE.
  [ 2143.278996] AS:0000000000ea0007 R3:0000000000000024
  [ 2143.279052] Oops: 0038 ilc:3 [#1] SMP
  [ 2143.279055] Modules linked in: bonding 8021q garp mrp stp llc qeth_l3 
binfmt_misc macsec vsock_diag vsock sctp_diag sctp dccp_diag dccp tcp_diag 
udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag xt_tcpudp 
qeth_l2 nf_conntrack_ipv6 nf_defrag_ipv6 scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c 
crc32_vx_s390 ghash_s390 prng sha512_s390 sha256_s390 sha1_s390 sha_common 
chsc_sch eadm_sch ip6table_filter ip6_tables qeth ccwgroup vfio_ccw vfio_mdev 
mdev vfio_iommu_type1 vfio iptable_filter sch_fq_codel ip_tables x_tables 
aes_s390 des_s390 des_generic dm_crypt dm_service_time dm_multipath zfcp 
scsi_transport_fc qdio dasd_eckd_mod dasd_mod btrfs xor zstd_compress raid6_pq 
zlib_deflate
  [ 2143.279099] CPU: 16 PID: 172270 Comm: dump2tar Tainted: G           OE    
4.15.0-36-generic #39-Ubuntu
  [ 2143.279100] Hardware name: IBM 2964 NC9 7A5 (LPAR)
  [ 2143.279102] Krnl PSW : 00000000d3630b5f 00000000af8614fc 
(debug_output+0x188/0x2f8)
  [ 2143.279108]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 
RI:0 EA:3
  [ 2143.279110] Krnl GPRS: 0000000000010000 ffffffff000002d8 7379732f6b65726e 
00000001db91a020
  [ 2143.279112]            0000000000000000 0000000000ea4ac8 00000001db91a020 
00000000000009d2
  [ 2143.279135]            0000000000000fe5 00000001000ff9ed 00000000000009d2 
00000000000009d2
  [ 2143.279137]            00000001db91a000 00000001db91a020 000000000010ac54 
00000001d16cbd30
  [ 2143.279146] Krnl Code: 000000000010ac68: 5810c010        l   %r1,16(%r12)
                            000000000010ac6c: ec180063ff7e    cij 
%r1,-1,8,10ad32
                           #000000000010ac72: e320c8280004    lg  %r2,2088(%r12)
                           >000000000010ac78: e33020300002    ltg %r3,48(%r2)
                            000000000010ac7e: a784008f        brc 8,10ad9c
                            000000000010ac82: 5a102028        a   %r1,40(%r2)
                            000000000010ac86: 5010c010        st  %r1,16(%r12)
                            000000000010ac8a: a7391000        lghi    %r3,4096
  [ 2143.279167] Call Trace:
  [ 2143.279169] ([<000000000010ac40>] debug_output+0x150/0x2f8)
  [ 2143.279172]  [<00000000004e02c4>] full_proxy_read+0x84/0xe0
  [ 2143.279175]  [<0000000000382592>] vfs_read+0x8a/0x150
  [ 2143.279177]  [<0000000000382b2e>] SyS_read+0x66/0xe0
  [ 2143.279180]  [<00000000008e3c98>] system_call+0xdc/0x2c8
  [ 2143.279182] Last Breaking-Event-Address:
  [ 2143.279184]  [<00000000008e7614>] __s390_indirect_jump_r14+0x0/0xc
  [ 2143.279185]
  [ 2143.279187] Kernel panic - not syncing: Fatal exception: panic_on_oops

  System Dump Location:
   kdump was configured and crash dump is available. since crash dump is huge 
to be added as bugzilla attachment, results of few crash commands like bt, log 
will be added in Attachment

  == Comment: #5 - Athira Rajeev
  Hi,

  since crash dump was huge to be added as bugzilla attachment, results
  of few crash commands like bt, log were added in the Attachment.
  Please let me know if required where to upload the dump files.

  Thanks
  Athira

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1797367/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to