[Kernel-packages] [Bug 1898786] Re: bcache: Issues with large IO wait in bch_mca_scan() when shrinker is enabled

Matthew Ruffell Fri, 27 Nov 2020 17:22:03 -0800

Hi Benjamin,

No worries about being busy.


Now, the kernel is scheduled to be released early next week, around the
30th of November. I think at this stage it is best to wait it out and
install the kernel once it reaches -updates.

That way you will have a fixed kernel that is supported by livepatch,
and you don't have to justify a reboot twice.

I did some regression testing in my comments above, and everything looks
okay. These patches also worked great in your test kernel. We have done
the best can to verify the kernel in the time given, so don't worry
about testing at this stage.

I'll let you know once the kernel has reached -updates, likely Monday or
Tuesday next week.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1898786

Title:
  bcache: Issues with large IO wait in bch_mca_scan() when shrinker is
  enabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1898786

  [Impact]

  Systems that utilise bcache can experience extremely high IO wait
  times when under constant IO pressure. The IO wait times seem to stay
  at a consistent 1 second, and never drop as long as the bcache
  shrinker is enabled.

  If you disable the shrinker, then IO wait drops significantly to
  normal levels.

  We did some perf analysis, and it seems we spend a huge amount of time
  in bch_mca_scan(), likely waiting for the mutex "&c->bucket_lock".

  Looking at the recent commits in Bionic, we found the following commit
  merged in upstream 5.1-rc1 and backported to 4.15.0-87-generic through
  upstream stable:

  commit 9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b
  Author: Coly Li <col...@suse.de>
  Date: Wed Nov 13 16:03:24 2019 +0800
  Subject: bcache: at least try to shrink 1 node in bch_mca_scan()
  Link: 
https://github.com/torvalds/linux/commit/9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b

  It mentions in the description that:

  > If sc->nr_to_scan is smaller than c->btree_pages, after the above
  > calculation, variable 'nr' will be 0 and nothing will be shrunk. It is
  > frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make
  > nr to be zero. Then bch_mca_scan() will do nothing more then acquiring
  > and releasing mutex c->bucket_lock.

  This seems to be what is going on here, but the above commit only
  addresses when nr is 0.

  From what I can see, the problems we are experiencing are when nr is 1
  or 2, and again, we just waste time in bch_mca_scan() waiting on
  c->bucket_lock, only to release c->bucket_lock since the shrinker loop
  never executes since there is no work to do.

  [Fix]

  The following commits fix the problem, and all landed in 5.6-rc1:

  commit 125d98edd11464c8e0ec9eaaba7d682d0f832686
  Author: Coly Li <col...@suse.de>
  Date: Fri Jan 24 01:01:40 2020 +0800
  Subject: bcache: remove member accessed from struct btree
  Link: 
https://github.com/torvalds/linux/commit/125d98edd11464c8e0ec9eaaba7d682d0f832686

  commit d5c9c470b01177e4d90cdbf178b8c7f37f5b8795
  Author: Coly Li <col...@suse.de>
  Date: Fri Jan 24 01:01:41 2020 +0800
  Subject: bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan()
  Link: 
https://github.com/torvalds/linux/commit/d5c9c470b01177e4d90cdbf178b8c7f37f5b8795

  commit e3de04469a49ee09c89e80bf821508df458ccee6
  Author: Coly Li <col...@suse.de>
  Date: Fri Jan 24 01:01:42 2020 +0800
  Subject: bcache: reap from tail of c->btree_cache in bch_mca_scan()
  Link: 
https://github.com/torvalds/linux/commit/e3de04469a49ee09c89e80bf821508df458ccee6

  The first commit is a dependency of the other two. The first commit
  removes a "recently accessed" marker, used to indicate if a particular
  cache has been used recently, and if it has been, not consider it for
  cache eviction. The commit mentions that under heavy IO, all caches
  will end up being recently accessed, and nothing will ever be shrunk.

  The second commit changes a previous design decision of skipping the
  first 3 caches to shrink, since it is a common case to call
  bch_mca_scan() with nr being 1, or 2, just like 0 was common in the
  very first commit I mentioned. This time, if we land on 1 or 2, the
  loop exits and nothing happens, and we waste time waiting on locks,
  just like the very first commit I mentioned. The fix is to try shrink
  caches from the tail of the list, and not the beginning.

  The third commit fixes a minor issue where we don't want to re-arrange
  the linked list c->btree_cache, which is what the second commit ended
  up doing, and instead, just shrink the cache at the end of the linked
  list, and not change the order.

  One minor backport / context adjustment was required in the first
  commit for Bionic, and the rest are all clean cherry picks to Bionic
  and Focal.

  [Testcase]

  This is kind of hard to test, since the problem shows up in production
  environments that are under constant IO pressure, with many different
  items entering and leaving the cache.

  The Launchpad git server is currently suffering this issue, and has
  been sitting at a constant IO wait of 1 second / slightly over 1
  second which was causing slow response times, which was leading to
  build failures when git clones ended up timing out.

  We installed a test kernel, which is available in the following PPA:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf294907-test

  Once the test kernel had been installed, IO wait times with the
  shrinker enabled dropped to normal levels, and the git server became
  responsive again. We have been monitoring the performance of the git
  server and watching IO wait times in grafana over the past week, and
  everything is looking good, and indicate that these patches solve the
  issue.

  [Regression Potential]

  If a regression were to occur, it would only affect users of bcache
  who have the shrinker enabled, which I believe is the default setting
  in bcache deployments.

  In that case, users could disable the shrinker as a workaround until a fix is 
available. You can disable the shrinker with:
  "sudo bash -c "echo 1 > 
/sys/fs/bcache/<uuid>/internal/btree_shrinker_disabled""

  The patches remove the "recently accessed" marker, which changes the
  behaviour slightly when it comes to deciding what to evict from the
  cache. Some workloads that aren't under IO pressure may see a
  difference in what items are evicted from the cache, which may
  slightly change bcache's performance profile.

  Since the patches prune the cache from the tail of the cache linked
  list, we will likely end up pruning more nodes than previously, since
  it was common to land on the first 3 caches, which were hard coded to
  be ignored and not pruned, which may also slightly change the bcache's
  performance profile.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898786/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1898786] Re: bcache: Issues with large IO wait in bch_mca_scan() when shrinker is enabled

Reply via email to