This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1614565

Title:
  ISST-LTE:pKVM311:lotg5:Ubutu16041:lotg5 crashed @
  writeback_sb_inodes+0x30c/0x590

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  == Comment: #0 - PRIYA M. A <priya...@in.ibm.com> - 2016-06-17 10:01:28 ==
  Problem Description:
  ================
  - lotg5 crashed at writeback_sb_inodes+0x30c/0x590

  Steps to re-create:
  ==============
  - Install lotg5 with Ubuntu16041(4.4.0-24-generic)
  - Start the regression tests in lotg5
  Logs:
  ====
  root@lotg5:~# show.report.py
  HOSTNAME    KERNEL VERSION      DISTRO INFO
  --------    ----------------    -----------
  lotg5       4.4.0-24-generic    Ubuntu 16.04 LTS \n \l

  ######## Current Time: Fri Jun 17 01:10:46 2016 ########
  Job-ID  FOCUS   Start-Time              Duration                Function
  ------  -----   ----------              --------                --------
  1       BASE    20160614-05:50:19       67.0 hr(s) 20.0 min(s)  Test
  2       IO      20160614-05:50:26       67.0 hr(s) 20.0 min(s)  IO_Focus
  3       NFS     20160614-06:24:35       66.0 hr(s) 46.0 min(s)  
DistributeFS_Testing
  4       TCP     20160614-06:32:03       66.0 hr(s) 38.0 min(s)  
networkTest2lotg3

  FOCUS           BASE    IO      NFS     TCP     SUM
  TOTAL           48647   1825    517     82690   133679
  FAIL            5028    0       0       24      5052
  PASS            43619   1825    517     82666   128627
  (%)             (89%)   (100%)  (100%)  (99%)   (96%)

  DLPAR is not tested!
  root@lotg5:~#

  - After 65+ hr of execution lotg5 crashed with follwoing call traces
  Logs:
  ====
  [root@lotkvm ~]# virsh console lotg5
  Connected to domain lotg5
  Escape character is ^]

  0:mon> c
  cpus stopped: 0x0 0x4 0x8 0xc
  0:mon> d
  0000000000000000 **************** ****************  |                |
  0:mon> e
  cpu 0x0: Vector: 300 (Data Access) at [c0000000c4f4b620]
      pc: c000000000323720: locked_inode_to_wb_and_lock_list+0x50/0x290
      lr: c000000000326dbc: writeback_sb_inodes+0x30c/0x590
      sp: c0000000c4f4b8a0
     msr: 8000000100009033
     dar: 0
   dsisr: 40000000
    current = 0xc00000017191cf60
    paca    = 0xc000000007b40000   softe: 0        irq_happened: 0x01
      pid   = 5792, comm = kworker/u32:5
  0:mon> t
  [c0000000c4f4b900] c000000000326dbc writeback_sb_inodes+0x30c/0x590
  [c0000000c4f4ba10] c000000000327124 __writeback_inodes_wb+0xe4/0x150
  [c0000000c4f4ba70] c00000000032758c wb_writeback+0x30c/0x450
  [c0000000c4f4bb40] c00000000032803c wb_workfn+0x14c/0x570
  [c0000000c4f4bc50] c0000000000dd1d0 process_one_work+0x1e0/0x5a0
  [c0000000c4f4bce0] c0000000000dd724 worker_thread+0x194/0x680
  [c0000000c4f4bd80] c0000000000e61e0 kthread+0x110/0x130
  [c0000000c4f4be30] c000000000009538 ret_from_kernel_thread+0x5c/0xa4
  --- Exception: 0  at 0000000000000000
  0:mon>

  
  == Comment: #4 - Chandan Kumar <ckuma...@in.ibm.com> - 2016-06-20 06:23:33 ==
  dmesg log:
  -------------
  [251403.003999] EXT4-fs (loop0): mounted filesystem without journal. Opts: 
(null)
  [251403.471118] Unable to handle kernel paging request for data at address 
0x00000000
  [251403.473391] Faulting instruction address: 0xc000000000323720  << ---- PC
  -------------

  0:mon> di c000000000323720
  c000000000323720  e93f0000    ld      r9,0(r31)  
  // [R31 = 0000000000000000, trying to de-reference null address]
  c000000000323724  39290050    addi    r9,r9,80
  c000000000323728  7fbf4840    cmpld   cr7,r31,r9

  ====

  Dominic,

  Can you please take a look and assign this to suitable developer.

  Thanks,
  Chandan

  == Comment: #6 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-06-20 
13:03:15 ==
  It sounds that inode->i_wb has been cleared while waiting for IO to be 
dropped in writeback_sb_inodes().

  That's need to be double checked...

  == Comment: #10 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-06-21 
05:11:35 ==
  That seems to be an already known issue raised by commit 43d1c0eb7e11 "block: 
detach bdev inode from its wb in __blkdev_put()".

  There is a patch pushed on the lkml but there is still on going discussion 
about it :
  https://patchwork.kernel.org/patch/9184495/
  https://lkml.org/lkml/2016/6/17/676

  == Comment: #13 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-06-22 
03:29:00 ==
  It appears that the right way to fix that would be 
https://patchwork.kernel.org/patch/9187409/.

  I may build a patched ubuntu kernel on your node and you may restart the test 
again.
  Do you agree ?

  == Comment: #14 - PRIYA M. A <priya...@in.ibm.com> - 2016-06-22 03:44:00 ==
  Sure Laurent. lotg5 is being installed. Will update this bug once 
installation is complete so that you can apply on lotg5 and I will start tests 
in it

  == Comment: #16 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-06-22 
06:21:05 ==
  root@lotg5:~# uname -a
  Linux lotg5 4.4.0-24-generic #43+ldu SMP Wed Jun 22 03:24:05 CDT 2016 ppc64le 
ppc64le ppc64le GNU/Linux

  The patch kernel (#43+ldu) is installed in place of the ubuntu one and is 
running on lotg5.
  Please give it a try...

  == Comment: #19 - PRIYA M. A <priya...@in.ibm.com> - 2016-06-29 02:33:54 ==
  - Issue is not seen at lotg5

  == Comment: #21 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-07-12 
12:01:00 ==
  (In reply to comment #20)
  > (In reply to comment #19)
  > > - Issue is not seen at lotg5
  > 
  > Can we close this bug then?

  I would prefer waiting for the patch mentioned in comment #13 to be accepted 
upstream.
  I'll update this bug once this done.

  == Comment: #22 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-07-25 
08:00:20 ==
  I asked on the mailing list why the patch mentioned in comment #13 is not yet 
upstream. 
  I'll update the bug once I got a reply.

  == Comment: #23 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-07-26 
10:27:34 ==
  The patch has been applied on the linux-fsdevel tree, it is on the way to be 
applied in 4.8.
  I think this can now be closed

  == Comment: #24 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-07-26 
10:30:14 ==
  For the record: https://patchwork.kernel.org/patch/9247955/

  == Comment: #29 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-08-18 
09:14:41 ==
  The patch is now part of the kernel 4.8-rc1.
  It would have to be backported to 16.04.

  == Comment: #31 - Laurent Dufour <laurent.duf...@fr.ibm.com> - 2016-08-18 
09:16:25 ==
  Requesting mirroring to get the kernel commit 
dc5ff2b1d66f21c27a4c37236636dff6946437e4 backported to Ubuntu kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1614565/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to