Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-10-21 Thread Aurelien Jarno
On Thu, Oct 16, 2014 at 07:49:29PM -0400, Carlos O'Donell wrote:
 I disagree. IMO the most flexible approach is for glibc to stop using cpuid
 for RTM detection and rely on the kernel to tell it if RTM is usable. Then
 we have a single hardware blacklist in the kernel. We need to talk to
 kernel people about this. Not to mention we might extend a getauxval-type
 API to prevent applications from using cpuid directly e.g. create a
 platform header for this with an x86 specific feature interface.

That looks like a good plan in the long term, that said if we involve
the kernel in this it might takes months or even more until every is
ready and in sync.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20141021101038.gv...@hall.aurel32.net



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-10-21 Thread Aurelien Jarno
On Mon, Oct 20, 2014 at 11:51:14AM -0200, Henrique de Moraes Holschuh wrote:
 On Thu, 16 Oct 2014, Carlos O'Donell wrote:
  I disagree. IMO the most flexible approach is for glibc to stop using cpuid
  for RTM detection and rely on the kernel to tell it if RTM is usable. Then
  we have a single hardware blacklist in the kernel. We need to talk to
  kernel people about this. Not to mention we might extend a getauxval-type
  API to prevent applications from using cpuid directly e.g. create a
  platform header for this with an x86 specific feature interface.
 
 We are about to freeze for the Jessie release.  I am only asking that we put
 a stopgap measure in place until a proper fix can be deployed upstream (and
 backported to Debian's glibc 2.19 and Linux 3.16).

This is a serious issue, and there is a bug report about it. We will
definitely fix it before the release. The freeze is only one step and not
an excuse to rush things. An RC bug can be fixed after the freeze, even
if it would be better to fix it before.

In addition I have been busy with things that are I consider more
important, fixing glibc and tzdata for old-stable and stable.

 At this time I don't care whether we go with the processor blacklist or take
 the more conservative path and disable lock elision code in Debian, as long
 as we do something.

I feel sad that the lock elision code can't really be disabled without
using additional patches that are not even upstream. I have tested your 
blacklist patch, and it works fine on the few systems I have tested.
I'll therefore go that way.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20141021101721.gw...@hall.aurel32.net



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-10-21 Thread Henrique de Moraes Holschuh
On Tue, 21 Oct 2014, Aurelien Jarno wrote:
 On Thu, Oct 16, 2014 at 07:49:29PM -0400, Carlos O'Donell wrote:
  I disagree. IMO the most flexible approach is for glibc to stop using cpuid
  for RTM detection and rely on the kernel to tell it if RTM is usable. Then
  we have a single hardware blacklist in the kernel. We need to talk to
  kernel people about this. Not to mention we might extend a getauxval-type
  API to prevent applications from using cpuid directly e.g. create a
  platform header for this with an x86 specific feature interface.
 
 That looks like a good plan in the long term, that said if we involve
 the kernel in this it might takes months or even more until every is
 ready and in sync.

Depending on what you need, I can do the kernel side, but that's two weeks
to one month to write and test the patch, plus the time required to get it
reviewed in LKML, accepted, and merged in mainline on the *next* merge
window after it was accepted (which can be three months away worst-case).

Six months is a realistic, if a bit optimistic, target for this.  If we add
the patch to the Debian kernel after it is accepted, but before it is
merged, three months.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20141021121727.gc22...@khazad-dum.debian.net



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-10-20 Thread Henrique de Moraes Holschuh
On Thu, 16 Oct 2014, Carlos O'Donell wrote:
 I disagree. IMO the most flexible approach is for glibc to stop using cpuid
 for RTM detection and rely on the kernel to tell it if RTM is usable. Then
 we have a single hardware blacklist in the kernel. We need to talk to
 kernel people about this. Not to mention we might extend a getauxval-type
 API to prevent applications from using cpuid directly e.g. create a
 platform header for this with an x86 specific feature interface.

We are about to freeze for the Jessie release.  I am only asking that we put
a stopgap measure in place until a proper fix can be deployed upstream (and
backported to Debian's glibc 2.19 and Linux 3.16).

At this time I don't care whether we go with the processor blacklist or take
the more conservative path and disable lock elision code in Debian, as long
as we do something.

*I* can't do much more than provide patches, I am not about to NMU glibc.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20141020135114.ga19...@khazad-dum.debian.net



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-10-16 Thread Aurelien Jarno
On Sat, Sep 20, 2014 at 12:05:54AM -0400, Carlos O'Donell wrote:
 On Fri, Sep 19, 2014 at 9:59 PM, Henrique de Moraes Holschuh
 h...@debian.org wrote:
  On Fri, 19 Sep 2014, Carlos O'Donell wrote:
  On Fri, Sep 19, 2014 at 6:18 PM, Henrique de Moraes Holschuh
  h...@debian.org wrote:
   On Fri, 19 Sep 2014, Henrique de Moraes Holschuh wrote:
   I can live with that, and I think I can prepare a patch if you want me 
   to.
  
   Here's a minimal patch to glibc that should do it (compile tested).
 
  The GNU C Library only uses elision if built with 
  --enable-lock-elision=yes.
 
  All you need to do is not build glibc with this flag.
 
  Given Jessie's expected lifetime, I'd say the blacklist is a much better
  choice with the data currently available.
 
 Why not ignore this, call it a hardware problem, and let users update
 the microcode if their devices are broken?

The real problem that we try to address here is not that we want to
disable TSX on these machines because it's broken. What we want to
address here is that users might upgrade the microcode from Linux
(either during boot time or later), which will cause all processes using
pthread (which also includes systemd) to die almost instantaneously.

We might want to simply use --enable-lock-elision=no as currently no CPU
support TSX (after microcode upgrade), but the problem will be there
again once TSX support is added back to new CPUs. As the problem will
have to be solved at some point, let's do it now, especially given that
I hope that TSX will be added back during the lifetime of Jessie. I 
therefore think Henrique's approach is the correct one. It also means
that we only need to blacklist TSX on machines which have a microcode
update available.

I am currently doing a test build with his patch, if everything goes
well I'll merge it in Debian. Henrique: in that case it would be nice
if you can forward this patch upstream for discussion.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20141016221333.gs...@hall.aurel32.net



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-10-16 Thread Carlos O'Donell
I disagree. IMO the most flexible approach is for glibc to stop using cpuid
for RTM detection and rely on the kernel to tell it if RTM is usable. Then
we have a single hardware blacklist in the kernel. We need to talk to
kernel people about this. Not to mention we might extend a getauxval-type
API to prevent applications from using cpuid directly e.g. create a
platform header for this with an x86 specific feature interface.

c.


Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-09-29 Thread Henrique de Moraes Holschuh
On Fri, 19 Sep 2014, Henrique de Moraes Holschuh wrote:
 On Fri, 19 Sep 2014, Carlos O'Donell wrote:
  On Fri, Sep 19, 2014 at 6:18 PM, Henrique de Moraes Holschuh
  h...@debian.org wrote:
   On Fri, 19 Sep 2014, Henrique de Moraes Holschuh wrote:
   I can live with that, and I think I can prepare a patch if you want me 
   to.
  
   Here's a minimal patch to glibc that should do it (compile tested).
  
  The GNU C Library only uses elision if built with --enable-lock-elision=yes.
  
  All you need to do is not build glibc with this flag.
 
 Given Jessie's expected lifetime, I'd say the blacklist is a much better
 choice with the data currently available.

Fedora has temporarily disabled lock elision on F20, F21, rawhide.  They've
detected the need to keep it enabled on s390, s390x though.

https://bugzilla.redhat.com/show_bug.cgi?id=1146967

Apparently there's at least one codepath that attempts to use lock elision
regardless of --enable-lock-elision in x86 in rwlock.  I'm searching for it.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140929122712.ga11...@khazad-dum.debian.net



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-09-29 Thread Henrique de Moraes Holschuh
On Mon, 29 Sep 2014, Henrique de Moraes Holschuh wrote:
 On Fri, 19 Sep 2014, Henrique de Moraes Holschuh wrote:
 Apparently there's at least one codepath that attempts to use lock elision
 regardless of --enable-lock-elision in x86 in rwlock.  I'm searching for it.

Indeed it looks like glibc 2.19 also fails to disable hardware lock elision
entirely, even in a --disable-lock-elision build.

The blacklist patch would disable it, though, as setting HAS_RTM to zero
must be enough to disable any use of Intel TSX.  Otherwise libpthread would
SIGILL on every processor that doesn't have Intel TSX instructions enabled,
and we know it isn't doing that.

After looking at it a bit closer, I have changed my instance: I recommend
that lock elision support in glibc should be disabled for Debian jessie.
This thing doesn't look nearly stable enough in glibc 2.19, and any bugs it
might cause *will be subtle*.


Patch source: Fedora RPM (glibc-2.20-5.fc21)
  http://koji.fedoraproject.org/koji/buildinfo?buildID=581316

Original patch author: Carlos O'Donell car...@redhat.com


This patch required some manual fixup.  I've done some light inspection of
the code to make sure it is sane.

NOTE: it looks like HLE is always attempted in rwlocks in any boxes that
advertise HAS_RTM, even if __libc_enable_secure is set.  The patch _does
not_ change this behaviour, be it safe or unsafe.  Fedora's patch doesn't
change this, and I don't know enough to judge.

The patch was compile-tested.  Some tests on the testsuite are failing, but
apparently not because of the patch.

objdump tells me Intel TXS opcodes DO leak into libpthread (inside the
__lll_*_elision routines) no matter what you do, i.e. --disable-lock-elision
is indeed a lie if you don't take pains to make sure these routines are not
called at runtime.

Since we shouldn't trust --disable-lock-elision to do the full job, it seems
best to make doubly sure Intel TSX won't be used: patch to override HAS_RTM
to really disable this stuff attached (glibc_2.19-really-disable-hle.patch).

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
--- a/nptl/sysdeps/unix/sysv/linux/x86/elision-conf.c	2014-02-07 07:04:38.0 -0200
+++ b/nptl/sysdeps/unix/sysv/linux/x86/elision-conf.c	2014-09-29 11:19:55.899095112 -0300
@@ -62,9 +62,15 @@
 	  char **argv  __attribute__ ((unused)),
 	  char **environ)
 {
-  __elision_available = HAS_RTM;
 #ifdef ENABLE_LOCK_ELISION
+  __elision_available = HAS_RTM;
   __pthread_force_elision = __libc_enable_secure ? 0 : __elision_available;
+  if (!HAS_RTM)
+__elision_aconf.retry_try_xbegin = 0; /* Disable elision on rwlocks */
+#else
+  __elision_available = 0;
+  __pthread_force_elision = 0;
+  __elision_aconf.retry_try_xbegin = 0;
 #endif
 }
 
--- a/sysdeps/x86_64/multiarch/init-arch.h	2014-09-29 11:36:07.511536944 -0300
+++ b/sysdeps/x86_64/multiarch/init-arch.h	2014-09-29 11:36:59.622947602 -0300
@@ -152,7 +152,7 @@
 # define HAS_SSSE3	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSSE3)
 # define HAS_SSE4_1	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSE4_1)
 # define HAS_SSE4_2	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_1, ecx, bit_SSE4_2)
-# define HAS_RTM	HAS_CPU_FEATURE (COMMON_CPUID_INDEX_7, ebx, bit_RTM)
+# define HAS_RTM	(0)
 
 # define index_Fast_Rep_String		FEATURE_INDEX_1
 # define index_Fast_Copy_Backward	FEATURE_INDEX_1


Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-09-19 Thread Henrique de Moraes Holschuh
Package: libc6
Version: 2.19-0experimental0
Severity: grave
Justification: causes non-serious data loss

libpthread-2.19 has HLE (hardware-assisted lock elision) support.
Unfortunately, on Intel-based x86 processors, the use of HLE is currently
hazardous.

Summary:  Use of HLE on all current Intel Haswell processors (the only x86
processors with HLE support so far) can cause unpredictable system
behaviour, including the possibility of hangs and memory corruption.
Updating the microcode on these Intel Haswell processors when Intel TSX is
in use by libpthread will cause running processes linked to libpthread to be
killed with SIGILL.

This issue is, AFAIK, impossible to work around in the kernel.  Since glibc
uses the cpuid instruction directly, the kernel cannot prevent libpthreads
from attempting to use Intel TSX.

Non-free will work around the microcode update issue by enforcing that all
microcode updates be done in the initramfs (i.e. require a reboot to apply,
and require initramfs).

Unfortunately, this is not going to be enough as most users don't have
intel-microcode installed in their Intel-based systems, and therefore would
still be at risk of data loss or data corruption due to erratum HSD136.

Please disable hardware-assisted lock elision (HLE) on X86/X86-64 Intel
Haswell Processors in libpthreads.


Details:


On unpatched Intel processors, HLE will hit erratum HSD136:

HSD136.  Software Using IntelĀ® TSX May Result in Unpredictable System
 Behavior

Problem: Under a complex set of internal timing condit ions and system
 events, software using the Intel TSX (Transactional Synchronization
 Ex tensions) instructions may result in unpredictable system
 behavior.

(Erratum description from: Desktop 4th Generation Intel Core Processor Family
Specification Update, June 2013, #328899-001).

This erratum is serious enough for Intel to take the PR hit and withdraw the
feature on all Haswell cores, including the just-launched Haswell-EP E5v3
Xeons.  (ref:
http://www.anandtech.com/show/8376/intel-disables-tsx-instructions-erratum-found-in-haswell-haswelleep-broadwelly
).

On patched Intel processors, Intel TSX will be disabled by the microcode.
When disabled, any Intel TSX instructions will generate an illegal opcode
trap.  Intel TSX support supposedly can be re-enabled *during system boot*
by the UEFI firmware through an undisclosed method.

Unfortunately, the act of updating the microcode will immediately disable
Intel TSX, causing all running processors linked to libpthread-2.19 to trap
and crash with SIGILL:

[ 43.606830] microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.608466] microcode: CPU0 updated to revision 0x1c, date = 2014-07-03
[ 43.608494] microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.609327] microcode: CPU1 updated to revision 0x1c, date = 2014-07-03
[ 43.609352] do_trap: 267 callbacks suppressed
[ 43.609354] traps: rs:main Q:Reg[1343] trap invalid opcode ip:7f32abd0b7ab
sp:7f32a9062848 error:0
[ 43.609355] microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.609358] in libpthread-2.19.so[7f32abcfa000+18000]
[ 43.610204] microcode: CPU2 updated to revision 0x1c, date = 2014-07-03
[ 43.610225] microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x1a
[ 43.611081] microcode: CPU3 updated to revision 0x1c, date = 2014-07-03
[ 43.611507] traps: systemd[1] trap invalid opcode ip:7f844f84a7ab
sp:7fff2ccf7e28 error:0 in libpthread-2.19.so[7f844f839000+18000]
[...]

Ref: https://bugs.launchpad.net/intel/+bug/1370352

It is unknown at this time what will happen on future microcode updates.  It
is entirely possible that the act of updating the microcode will always
reset Intel TSX to its default disabled state, regardless of whether the
BIOS had force-enabled it or not at boot.   This is the reason why I will
drop support for microcode updates outside of the initramfs in non-free.


Therefore, due to erratum HSD136 and the lack of widespread use of microcode
updates, libpthread-2.19 must stop using HLE on the problematic Intel
processors.

Here's the data required for the blacklist:

CPUID signature : family : model : stepping
0x000306fZ  :   6:  63   : Z = 2
0x000306cZ  :   6:  60   : Z = 3
0x0004065Z  :   6:  69   : Z = 1
0x0004066Z  :   6:  70   : Z = 1

Note: this list is not likely to be complete.  Some Engineering Sample
signatures may be missing, as well as other Haswell processor signatures we
don't know about.

You may want to consider blacklisting HLE on all Intel processors (not just
the processors above) until we are sure we know about the cpuid signature of
all processors that need blacklisting.


[1] Haswell/Haswell-E/Haswell-EP processors running with the following
microcode installed, or any later revision:

sig 0x000306f2, 2014-09-03, rev 0x0029
sig 0x000306c3, 2014-07-03, rev 0x001c
sig 0x00040651, 2014-07-03, rev 0x001c
sig 0x00040661, 2014-07-03, rev 0x0012

This list is likely 

Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-09-19 Thread Aurelien Jarno
On Fri, Sep 19, 2014 at 10:09:24AM -0300, Henrique de Moraes Holschuh wrote:
 Package: libc6
 Version: 2.19-0experimental0
 Severity: grave
 Justification: causes non-serious data loss
 
 libpthread-2.19 has HLE (hardware-assisted lock elision) support.
 Unfortunately, on Intel-based x86 processors, the use of HLE is currently
 hazardous.
 
 Summary:  Use of HLE on all current Intel Haswell processors (the only x86
 processors with HLE support so far) can cause unpredictable system
 behaviour, including the possibility of hangs and memory corruption.
 Updating the microcode on these Intel Haswell processors when Intel TSX is
 in use by libpthread will cause running processes linked to libpthread to be
 killed with SIGILL.
 
 This issue is, AFAIK, impossible to work around in the kernel.  Since glibc
 uses the cpuid instruction directly, the kernel cannot prevent libpthreads
 from attempting to use Intel TSX.
 
 Non-free will work around the microcode update issue by enforcing that all
 microcode updates be done in the initramfs (i.e. require a reboot to apply,
 and require initramfs).
 
 Unfortunately, this is not going to be enough as most users don't have
 intel-microcode installed in their Intel-based systems, and therefore would
 still be at risk of data loss or data corruption due to erratum HSD136.
 
 Please disable hardware-assisted lock elision (HLE) on X86/X86-64 Intel
 Haswell Processors in libpthreads.
 
 
 Details:
 
 
 On unpatched Intel processors, HLE will hit erratum HSD136:
 
 HSD136.  Software Using IntelĀ® TSX May Result in Unpredictable System
  Behavior
 
 Problem: Under a complex set of internal timing condit ions and system
events, software using the Intel TSX (Transactional Synchronization
Ex tensions) instructions may result in unpredictable system
behavior.
 
 (Erratum description from: Desktop 4th Generation Intel Core Processor Family
 Specification Update, June 2013, #328899-001).
 
 This erratum is serious enough for Intel to take the PR hit and withdraw the
 feature on all Haswell cores, including the just-launched Haswell-EP E5v3
 Xeons.  (ref:
 http://www.anandtech.com/show/8376/intel-disables-tsx-instructions-erratum-found-in-haswell-haswelleep-broadwelly
 ).
 
 On patched Intel processors, Intel TSX will be disabled by the microcode.
 When disabled, any Intel TSX instructions will generate an illegal opcode
 trap.  Intel TSX support supposedly can be re-enabled *during system boot*
 by the UEFI firmware through an undisclosed method.
 
 Unfortunately, the act of updating the microcode will immediately disable
 Intel TSX, causing all running processors linked to libpthread-2.19 to trap
 and crash with SIGILL:
 
 [ 43.606830] microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x1a
 [ 43.608466] microcode: CPU0 updated to revision 0x1c, date = 2014-07-03
 [ 43.608494] microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x1a
 [ 43.609327] microcode: CPU1 updated to revision 0x1c, date = 2014-07-03
 [ 43.609352] do_trap: 267 callbacks suppressed
 [ 43.609354] traps: rs:main Q:Reg[1343] trap invalid opcode ip:7f32abd0b7ab
 sp:7f32a9062848 error:0
 [ 43.609355] microcode: CPU2 sig=0x306c3, pf=0x2, revision=0x1a
 [ 43.609358] in libpthread-2.19.so[7f32abcfa000+18000]
 [ 43.610204] microcode: CPU2 updated to revision 0x1c, date = 2014-07-03
 [ 43.610225] microcode: CPU3 sig=0x306c3, pf=0x2, revision=0x1a
 [ 43.611081] microcode: CPU3 updated to revision 0x1c, date = 2014-07-03
 [ 43.611507] traps: systemd[1] trap invalid opcode ip:7f844f84a7ab
 sp:7fff2ccf7e28 error:0 in libpthread-2.19.so[7f844f839000+18000]
 [...]
 
 Ref: https://bugs.launchpad.net/intel/+bug/1370352

It looks like Intel did crap there, and that the GNU libc has to handle
this crap. The microcode update could have stop advertising the
instructions while still supporting them...

 It is unknown at this time what will happen on future microcode updates.  It
 is entirely possible that the act of updating the microcode will always
 reset Intel TSX to its default disabled state, regardless of whether the
 BIOS had force-enabled it or not at boot.   This is the reason why I will
 drop support for microcode updates outside of the initramfs in non-free.
 
 
 Therefore, due to erratum HSD136 and the lack of widespread use of microcode
 updates, libpthread-2.19 must stop using HLE on the problematic Intel
 processors.

I will try to work on a patch but this won't be enough, until the users
reboot their system it's very likely that some process using the old
libpthread with HLE enabled will remain.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140919194510.gm...@hall.aurel32.net



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-09-19 Thread Henrique de Moraes Holschuh
On Fri, 19 Sep 2014, Aurelien Jarno wrote:
 It looks like Intel did crap there, and that the GNU libc has to handle
 this crap. The microcode update could have stop advertising the
 instructions while still supporting them...

They had their reasons to not do it that way, I suppose.  I don't think the
Intel microcode teams optimize for any other update mode than do it at
firmware init or kernel init.

What gets me angry is that, as usual, we had no idea of what the microcode
update would do, since we also had no idea of what errata it was supposed to
fix.

  Therefore, due to erratum HSD136 and the lack of widespread use of microcode
  updates, libpthread-2.19 must stop using HLE on the problematic Intel
  processors.
 
 I will try to work on a patch but this won't be enough, until the users
 reboot their system it's very likely that some process using the old
 libpthread with HLE enabled will remain.

I can live with that, and I think I can prepare a patch if you want me to.
What I cannot do is test it.

BTW, I believe it should be possible to warn the user in postinst.  We just
need to look for the hle flag in /proc/cpuinfo when upgrading from a package
version that did not have the blacklist in place.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140919202541.ga30...@khazad-dum.debian.net



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-09-19 Thread Henrique de Moraes Holschuh
On Fri, 19 Sep 2014, Henrique de Moraes Holschuh wrote:
 I can live with that, and I think I can prepare a patch if you want me to.

Here's a minimal patch to glibc that should do it (compile tested).

This minimal patch does not give the local admin any way to override the
Intel TSX blacklist.  I don't know how this kind of override is usually done
(if it is done at all) in glibc.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
diff --git a/sysdeps/x86_64/multiarch/init-arch.c b/sysdeps/x86_64/multiarch/init-arch.c
index db74d97..6f61ae6 100644
--- a/sysdeps/x86_64/multiarch/init-arch.c
+++ b/sysdeps/x86_64/multiarch/init-arch.c
@@ -26,7 +26,7 @@ struct cpu_features __cpu_features attribute_hidden;
 
 
 static void
-get_common_indeces (unsigned int *family, unsigned int *model)
+get_common_indeces (unsigned int *family, unsigned int *model, unsigned int *stepping)
 {
   __cpuid (1, __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax,
 	   __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ebx,
@@ -36,6 +36,7 @@ get_common_indeces (unsigned int *family, unsigned int *model)
   unsigned int eax = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax;
   *family = (eax  8)  0x0f;
   *model = (eax  4)  0x0f;
+  *stepping = eax  0x0f;
 }
 
 
@@ -47,6 +48,7 @@ __init_cpu_features (void)
   unsigned int edx;
   unsigned int family = 0;
   unsigned int model = 0;
+  unsigned int stepping = 0;
   enum cpu_features_kind kind;
 
   __cpuid (0, __cpu_features.max_cpuid, ebx, ecx, edx);
@@ -56,7 +58,7 @@ __init_cpu_features (void)
 {
   kind = arch_kind_intel;
 
-  get_common_indeces (family, model);
+  get_common_indeces (family, model, stepping);
 
   unsigned int eax = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].eax;
   unsigned int extended_family = (eax  20)  0xff;
@@ -131,7 +133,7 @@ __init_cpu_features (void)
 {
   kind = arch_kind_amd;
 
-  get_common_indeces (family, model);
+  get_common_indeces (family, model, stepping);
 
   ecx = __cpu_features.cpuid[COMMON_CPUID_INDEX_1].ecx;
 
@@ -176,6 +178,14 @@ __init_cpu_features (void)
 	}
 }
 
+  /* Disable Intel TSX (HLE and RTM) due to erratum HSD136/HSW136
+ on Haswell processors, to work around outdated microcode that
+ doesn't disable the broken feature by default */
+  if (kind == arch_kind_intel  family == 6 
+  ((model == 63  stepping = 2) || (model == 60  stepping = 3) ||
+   (model == 69  stepping = 1) || (model == 70  stepping = 1)))
+__cpu_features.cpuid[COMMON_CPUID_INDEX_7].ebx = ~(bit_RTM | bit_HLE);
+
   __cpu_features.family = family;
   __cpu_features.model = model;
   atomic_write_barrier ();
diff --git a/sysdeps/x86_64/multiarch/init-arch.h b/sysdeps/x86_64/multiarch/init-arch.h
index 793707a..e2745cb 100644
--- a/sysdeps/x86_64/multiarch/init-arch.h
+++ b/sysdeps/x86_64/multiarch/init-arch.h
@@ -40,6 +40,7 @@
 
 /* COMMON_CPUID_INDEX_7.  */
 #define bit_RTM		(1  11)
+#define bit_HLE		(1  4)
 
 /* XCR0 Feature flags.  */
 #define bit_XMM_state  (1  1)


Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-09-19 Thread Carlos O'Donell
On Fri, Sep 19, 2014 at 6:18 PM, Henrique de Moraes Holschuh
h...@debian.org wrote:
 On Fri, 19 Sep 2014, Henrique de Moraes Holschuh wrote:
 I can live with that, and I think I can prepare a patch if you want me to.

 Here's a minimal patch to glibc that should do it (compile tested).

The GNU C Library only uses elision if built with --enable-lock-elision=yes.

All you need to do is not build glibc with this flag.

Cheers,
Carlos.


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/CAE2sS1jDpM8TCvQY57nM0zRz5=gmca0wvbduezqcmdhd+yh...@mail.gmail.com



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-09-19 Thread Henrique de Moraes Holschuh
On Fri, 19 Sep 2014, Carlos O'Donell wrote:
 On Fri, Sep 19, 2014 at 6:18 PM, Henrique de Moraes Holschuh
 h...@debian.org wrote:
  On Fri, 19 Sep 2014, Henrique de Moraes Holschuh wrote:
  I can live with that, and I think I can prepare a patch if you want me to.
 
  Here's a minimal patch to glibc that should do it (compile tested).
 
 The GNU C Library only uses elision if built with --enable-lock-elision=yes.
 
 All you need to do is not build glibc with this flag.

Given Jessie's expected lifetime, I'd say the blacklist is a much better
choice with the data currently available.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140920015909.ga12...@khazad-dum.debian.net



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-09-19 Thread Carlos O'Donell
On Fri, Sep 19, 2014 at 9:59 PM, Henrique de Moraes Holschuh
h...@debian.org wrote:
 On Fri, 19 Sep 2014, Carlos O'Donell wrote:
 On Fri, Sep 19, 2014 at 6:18 PM, Henrique de Moraes Holschuh
 h...@debian.org wrote:
  On Fri, 19 Sep 2014, Henrique de Moraes Holschuh wrote:
  I can live with that, and I think I can prepare a patch if you want me to.
 
  Here's a minimal patch to glibc that should do it (compile tested).

 The GNU C Library only uses elision if built with --enable-lock-elision=yes.

 All you need to do is not build glibc with this flag.

 Given Jessie's expected lifetime, I'd say the blacklist is a much better
 choice with the data currently available.

Why not ignore this, call it a hardware problem, and let users update
the microcode if their devices are broken?

Cheers,
Carlos.


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/CAE2sS1gs-oQVu_e6kzR_dOY9v+m2o+0arD=foU=vr2cf4d-...@mail.gmail.com



Bug#762195: libc6: libpthread: hardware-assisted lock elision hazardous on x86

2014-09-19 Thread Henrique de Moraes Holschuh
On Sat, 20 Sep 2014, Carlos O'Donell wrote:
 On Fri, Sep 19, 2014 at 9:59 PM, Henrique de Moraes Holschuh
 h...@debian.org wrote:
  On Fri, 19 Sep 2014, Carlos O'Donell wrote:
  On Fri, Sep 19, 2014 at 6:18 PM, Henrique de Moraes Holschuh
  h...@debian.org wrote:
   On Fri, 19 Sep 2014, Henrique de Moraes Holschuh wrote:
   I can live with that, and I think I can prepare a patch if you want me 
   to.
  
   Here's a minimal patch to glibc that should do it (compile tested).
 
  The GNU C Library only uses elision if built with 
  --enable-lock-elision=yes.
 
  All you need to do is not build glibc with this flag.
 
  Given Jessie's expected lifetime, I'd say the blacklist is a much better
  choice with the data currently available.
 
 Why not ignore this, call it a hardware problem, and let users update
 the microcode if their devices are broken?

Because we are better than that.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20140920041217.ga17...@khazad-dum.debian.net