** Changed in: ubuntu-power-systems
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1709889

Title:
  Ubuntu 17.04: Bug in cfq scheduler, I/Os do not get submitted to
  adapter for a very long time.

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  ---Problem Description---
  When running stress test, sometimes seeing IO hung in dmesg or seeing "Host 
adapter abort request" error.
    
  ---Steps to Reproduce---
   There are two ways to re-create the issues:
  (1)running HTX, you will see IO timeout backtrace in dmesg in several hours
  (2)running some IO test, then reboot system, repeat this two steps, it takes 
long time to re-create the issue.
   
  ---uname output---
  4.10.0-11-generic

  The bulk of the effort for this issue is currently being worked in
  MicroSemi's JIRA https://jira.pmcs.com/browse/ESDIBMOP-133.

  Ran an interesting test: Ran HTX until I started getting the "stall"
  messages on the console, then shutdown HTX and examined the I/O
  counters for the tested disks in sysfs:

  root@bostonp15:~# for i in 
/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:[2345]/0:2:[2345]:0;
 do echo ${i##*/} $(<${i}/iorequest_cnt) $(<${i}/iodone_cnt); done
  0:2:2:0 0x5eba3d 0x5eba3d
  0:2:3:0 0x773cc9 0x773cc9
  0:2:4:0 0x782c61 0x782c61
  0:2:5:0 0x5ca134 0x5ca134
  root@bostonp15:~#

  So, none of the disks showed any evidence of having lost an I/O. I
  then restarted HTX and aside from having to manually restart one of
  the disks, see no problems with the testing. It appears that what was
  "hung" was purely in userland.

  This does not absolve the kernel or aacraid driver from blame, but it
  shows that the OS "believes" that it completed the I/O and thus
  removed it from the queue. What we don't know is whether the OS truly
  notified HTX about the completion, or if HTX (or userland libraries)
  just failed to process the notification.

  Tests are running again, will see what happens next.

  Update from JIRA:

  I have run some more experiments. Not sure what it tells us, but
  here's what I've seen.

  First test, ran until I got kernel messages about stalled tasks, then
  shutdown HTX. After HTX was down, I checked the above mentioned
  counters and found that on each disk iorequest_cnt matched iodone_cnt.
  The disks were usable and I could restart HTX. This suggests that the
  problem is not in the PM8069 firmware, and makes the case for the
  aacraid driver having a bug somewhat weaker. However, this merely says
  that the driver "completed" the I/O as far as the kernel is concerned,
  not that a completion rippled back to the application.

  I restarted HTX and have run until errors. This time, I am leaving HTX
  running and observing. Two of the disks reached the HTX error
  threshold and the testers stopped (those 2 disks are now idle).
  Another disks saw errors but then stopped and appears to be running
  fine now. The last disk has not seen any errors (yet). On the two idle
  (errored-out) disks I see  iorequest_cnt matches iodone_cnt. I am able
  to "terminate and restart" the two idle disks and HTX appears to be
  testing them again "normally". Note that no reboot was required,
  further supporting the evidence that, as far as the kernel is
  concerned, there is nothing wrong with the disks and their I/O paths.

  So, I don't believe this completely eliminates aacraid from the
  picture, especially given we don't see this behavior on other
  systems/drivers. But, it probably moves the focus of the investigation
  away form the adapter firmware.

  Tried build upstream 4.11 kernel on Ubuntu. This still gets the hangs.
  Both Ubuntu 4.10 and upstream 4.11 have aacraid driver
  1.2.1[50792]-custom.

  Good new/bad news... While doing an initial evaluation of the LSI-3008
  SAS HBA on Boston and Ubuntu 17.04, I am hitting this same problem.
  So, it appears to have nothing specific to do with the PM8069 or
  aacraid driver.

  Some notes on reproduce this. I have been using the github release of
  HTX, built using the following steps:

  1. apt install make gcc g++ git libncurses5-dev libcxl-dev libdapl-dev 
(others may be required)
  2. git clone https://github.com/open-power/HTX
  3. cd HTX
  4. make
  5. make deb

  Then install the resulting "htxubuntu.deb" package.

  Note, HTX will not test disks that have a filesystem or OS installed,
  so there must be at least two disks made available to HTX by clearing
  any previous data. A partition table is optional, in my testing I have
  none.

  Also, it may be desirable to run HTX somewhere other than the console,
  leaving the console free to watch for messages.

  To run:

  A. su - htx (this may take some time)
  B. htx
  C. Select the test file "mdt.io"
  D. Hit ENTER for default log file option
  E. Once menu is display, select item 2 (Enable/disable hardware to test)
      E1. Enter "h" to disable (halt) all devices testing
      E2. Select at least two disks for testing (enter their line numbers)
      E3. Enter "q" to return to main menu
  F. Select item "4" (Continue On Error flags)
      F1. Enter line numbers for each disk previously selected to test.
      F2. Enter "q" to return to main menu.
  G. Select item "1" to begin the test exercisers.
  H. Optionally, select item "5" to display status of testing.

  After about 10-12 hours, there should be a few "INFO: task
  hxestorage:XXXXX blocked for more than 120 seconds." messages with
  stack traces. The typical stack trace is:

   sysctl_sched_migration_cost+0x0/0x4 (unreliable)
   __switch_to+0x2c0/0x450
   __schedule+0x2f8/0x990
   schedule+0x48/0xc0
   schedule_timeout+0x274/0x470
   io_schedule_timeout+0xd0/0x160
   debug_schedule+0x318/0x3c0
   __blkdev_direct_IO_simple+0x258/0x440
   blkdev_direct_IO+0x4e0/0x520
   generic_file_read_iter+0x2c8/0xaa0
   blkdev_read_iter+0x50/0x80
   new_sync_read+0xec/0x140
   vfs_read+0xbc/0x1b0
   SyS_read+0x68/0x110
   system_call+0x38/0xe0

  About 8 minutes after the "blocked" messages, you should start to see
  HTX reporting errors in "/tmp/htxerr" (HTX reports errors for I/Os
  that do not complete in 10 minutes, but continues to run).

  With added debugging, it was seen that the I/Os do eventually
  complete, but in some cases it can take over an hour. It is also
  observed that I/O traffic continues through these periods of stalls,
  and so only a portion of the total I/O traffic actually gets stalled.
  The system does not hang, and if HTX is shutdown (stopped), any
  stalled I/Os will complete immediately.

  Referencing  LP1469829, it seems that it was requested that "cfq"
  scheduler not be used by default  as it has this exact sort of bug,
  and that "deadline" should be used instead. Somewhere, the default got
  reverted back to "cfq" which exposes this bug again. It appears that
  the bug in "cfq" was never fixed, either.

  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469829

  A couple upstream commits of interest, ordered by perceived relevance.

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5be6b75610cefd1e21b98a218211922c2feb6e08

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=142bbdfccc8b3e9f7342f2ce8422e76a3b45beae

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1709889/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to