Public bug reported:

[Impact]

A caching bug in the hibernation code can lead to potential memory
corruptions on resume.

The hibernation code is representing all the allocated pages in memory
(pfn) using a list of extents, inside each extent it uses a radix tree
and each node in the tree contains a bitmap. This structure is used to
save the memory image to disk.

To speed up lookups in this structure the kernel is caching the position
of the previous lookup in the form (current_extent, current_node).
However, if two consecutive lookups are distant enough from each other,
the extent can change, but the kernel can still use the cached node
(current_node), accessing the wrong bitmap and ending up saving to disk
the wrong pfn's.

[Test Case]

Bug has been reproduced in Xenial and Bionic trying to hibernate a large
instance with a lot of RAM (100GB+).

But we also wrote a custom kernel module to better isolate the code that
triggers the problem: https://code.launchpad.net/~arighi/+git/mybitmap

This module has exactly the same code as the hibernation code, but it
can be used as a fast test case to reproduce the problem without
actually triggering a real hibernation/resume cycle.

[Fix]

This bug can be fixed by properly invalidating the cached pair (extent,
node) when the next lookup falls in a different extent or a different
node.

[Regression Potential]

The fix has been sent to the LKML for review/feedback
(https://lkml.org/lkml/2019/9/25/393), we have not received any feedback
so far, but the bug is pretty clear and well tested on the affected
platforms. Moreover, the code is isolated to the hibernation area, so
the overall regression potential is minimal.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: linux (Ubuntu Xenial)
     Importance: Undecided
         Status: New

** Affects: linux (Ubuntu Bionic)
     Importance: Undecided
         Status: New

** Affects: linux (Ubuntu Disco)
     Importance: Undecided
         Status: New

** Affects: linux (Ubuntu Eoan)
     Importance: Undecided
         Status: New

** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Summary changed:

- PM / hibernate: fix potential memory corruption on hibernate
+ PM / hibernate: fix potential memory corruption

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1847118

Title:
  PM / hibernate: fix potential memory corruption

Status in linux package in Ubuntu:
  New
Status in linux source package in Xenial:
  New
Status in linux source package in Bionic:
  New
Status in linux source package in Disco:
  New
Status in linux source package in Eoan:
  New

Bug description:
  [Impact]

  A caching bug in the hibernation code can lead to potential memory
  corruptions on resume.

  The hibernation code is representing all the allocated pages in memory
  (pfn) using a list of extents, inside each extent it uses a radix tree
  and each node in the tree contains a bitmap. This structure is used to
  save the memory image to disk.

  To speed up lookups in this structure the kernel is caching the
  position of the previous lookup in the form (current_extent,
  current_node). However, if two consecutive lookups are distant enough
  from each other, the extent can change, but the kernel can still use
  the cached node (current_node), accessing the wrong bitmap and ending
  up saving to disk the wrong pfn's.

  [Test Case]

  Bug has been reproduced in Xenial and Bionic trying to hibernate a
  large instance with a lot of RAM (100GB+).

  But we also wrote a custom kernel module to better isolate the code
  that triggers the problem:
  https://code.launchpad.net/~arighi/+git/mybitmap

  This module has exactly the same code as the hibernation code, but it
  can be used as a fast test case to reproduce the problem without
  actually triggering a real hibernation/resume cycle.

  [Fix]

  This bug can be fixed by properly invalidating the cached pair
  (extent, node) when the next lookup falls in a different extent or a
  different node.

  [Regression Potential]

  The fix has been sent to the LKML for review/feedback
  (https://lkml.org/lkml/2019/9/25/393), we have not received any
  feedback so far, but the bug is pretty clear and well tested on the
  affected platforms. Moreover, the code is isolated to the hibernation
  area, so the overall regression potential is minimal.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1847118/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to