On 04.10.2022 12:47, John Baldwin wrote:
On 10/4/22 7:44 AM, Alexander Motin wrote:
The branch main has been updated by mav:
URL:
https://cgit.FreeBSD.org/src/commit/?id=a58536b91ae3931d222c3e4f1a949ff4a4927fb2
commit a58536b91ae3931d222c3e4f1a949ff4a4927fb2
Author: Alexander Motin <m...@freebsd.org>
AuthorDate: 2022-10-04 14:34:15 +0000
Commit: Alexander Motin <m...@freebsd.org>
CommitDate: 2022-10-04 14:34:15 +0000
pci: Disable Electromechanical Interlock.
Add sysctl/tunable to control Electromechanical Interlock support.
Disable it by default since Linux does not do it either and it seems
the number of systems having it broken is higher than having
working.
This fixes NVMe backplane operation on ASUS RS500A-E11-RS12U server
with AMD EPYC 7402 CPU, where attempts to control reported interlock
for some reason end up in PCIe link loss, while interlock status
does
not change (it is not really there).
MFC after: 2 weeks
See also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256264, though
that is more for the case where slots aren't really hotplug at all.
The root issue seems to be that there are generic HotPlug-capable
bridges but that manufacturers fail to correctly wire up the various
input pins such that the bridges can actually determine that there is no
MRL or EI, etc. The above PR (which I still can't get the reporter to
test the patch for, but perhaps should just merge?) disables PCI-e hotplug
if the link is up, but the other status bits claim that the device is
partially inserted when attaching the bridge.
In my case the slots are really expected to be hot-pluggable, just ASUS
can't do things right. In the case of the PR your patch seems to have
sense. I'd be more worried about already present check for broken MRL
-- if we see MRL open, but device is still powered, we may wish to
quickly shut the device. But I agree that probability of false negative
here is much higher than of positive. I still haven't had my hands on
on any hardware implementing all those cool bells and whistles.
--
Alexander Motin