On Fri, Oct 23, 2015, at 11:13, Carlos Alberto Lopez Perez wrote: > I was having trouble (crashes with the NVIDIA proprietary driver) on a > Debian system with an i7-5775C and libc6=2.19-18+deb8u1 (stable)
This is very very likely to be braindamage on the NVIDIA driver, though. Are you sure that driver is not doing something as idiotic as unlocking an already unlocked mutex ? The proper fix in that case is _always_ to fix whatever is broken, because eventually it will run on something that has working hardware lock elision... and crash. > I tried first to update the Intel microcode with the "unreleased" 0x13 > microcode version but it didn't disabled the TSX-NI instructions [1] > neither the crashes. Mobile Broadwell-H seems to disable TSX, while Desktop Broadwell-H doesn't. That's why we blacklisted the whole thing: inconsistent behavior on the same microcode, and that behavior is itself inconsistent with the errata sheet that says such processors shouldn't even be able to advertise Intel TSX RTM in CPUID. At the moment, we don't even know what is wrong with RTM in Broadwell/Broadwel-H/Broadwell-DE. We do know some of what is wrong with HLE in Broadwell/-H/-DE (and it is really nasty), but HLE is not used by glibc in the first place, and the HLE erratum is supposedly worked around somehow (because it is documented to be so on the Xeon D-1500/Broadwell-DE) by the batch of microcode updates available in the kernel bugzilla bug report mentioned in this bug report. Broadwell-H Microcode 0x13 is useful anyway because it fixes other critical errata that hangs/oopses the kernel: you box should be a _lot_ more stable with it. And at least one person reported that not all hangs were fixed by microcode 0x12, thus you probably should use keep using microcode 0x13 (or newer, should one become available). > Finally I upgraded to glibc=2.21-0experimental2 and it fixed the crashes. "Works around" a bug in the NVIDIA drivers is just as likely, see above. If we instrumented non-lock-elision glibc to complain about operations that are illegal on most processors implementing lock elision, we'd know for sure. > Should this patch be backported both to stable and unstable? It needs to go to stable sooner than later, yes. But it seems wise to let it cook in unstable/testing for a bit, first. I don't know what the plans for uploading new glibc to unstable are. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique de Moraes Holschuh <[email protected]>

