Verified by code inspection on Bionic (4.15.0-102) and Xenial
(4.4.0-180). I don't have a system with Skylake available; there's an
user that experienced this and I'm waiting on his test, as soon as he
responds, I'll comment here. But marking as verifying anyway based on
the code lookup, we need the patches in this cycle.

Cheers,


Guilherme

** Tags removed: verification-needed-bionic verification-needed-xenial
** Tags added: verification-done-bionic verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1877858

Title:
  Improve TSC refinement (and calibration) reliability

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [Impact]
  * We received a report recently of a missing TSC refinement across multiple 
reboots of a server, in an Intel Skylake-based processor. This was only 
reproducible in Bionic pre-5.0.

  * After checking kernel commits, we came up with 2 commits that
  largely improve the situation: a786ef152cdc ("x86/tsc: Make
  calibration refinement more robust")
  [git.kernel.org/linus/a786ef152cdc] and 604dc9170f24 ("x86/tsc: Use
  CPUID.0x16 to calculate missing crystal frequency")
  [git.kernel.org/linus/604dc9170f24]. We hereby request SRU for both of
  them.

  * The first commit contains improvement in comments and in an offset to match 
more recent (fast) machines, but the important part is a retry mechanism in the 
TSC refinement (in case it fails due to some disturbance on TSC read, like 
NMIs/SMIs).
   
  * The second commit is an improvement in TSC calibration for Skylake (and 
some other models), by checking a register instead of relying on table-based 
hardcoded values.

  * A note for Xenial (kernel 4.4): the second patch would require the
  inclusion of more commits, so given the "maturity" of this release
  (and the fact kernel 4.15 is an HWE for Xenial), I've kept it out of
  Xenial, backporting only the first and more important patch for 4.4 .

  [Test case]
  * Unfortunately there's not an easy way to test the effectiveness of the 
commits, specially the refinement improvement.

  * The user that reported us the missing refinements was able to test
  300 reboots with a regular Bionic kernel (and it reproduced the issue
  at least once), whereas when they tested with Bionic kernel + both
  hereby proposed commits, the problem didn't happen.

  * Regarding the calibration commit, it was well-tested by community
  using multiple machines and checking the TSC calibration read vs.
  tables present in instlatx64.atw.hu .

  [Regression potential]
  * We consider the regression potential low, specially due to the nature of 
the patches: the first is basically a retry mechanism (and some improvement in 
an offset to reflect more recent machines), and the 2nd is an improvement for 
TSC calibration on some platforms (that are currently hardcoded in a 
table-based way in kernel). Also, the patches are present upstream for a while 
and I couldn't find any fixes for them.

  * An hypothetical regression from the 2nd patch could be in TSC
  precision calculation, which refinement itself might as well
  circumvent. From the first patch, a bug in code is the one
  hypothetical regression I could think.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1877858/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to