Thank you for the detailed report!

I've added these controllers for the quirk list. With ahcisata_pci.c rev 1.63
and later, AHCISATA_EXTRA_DELAY kernel option is no longer required.

Thanks,
rin

On 2022/05/27 15:02, Matthias Petermann wrote:
Hello Rin,

the option AHCISATA_EXTRA_DELAY seems to fix the problem for both systems below.

As discussed I send here the two dmesg with:

  - dmesg.nuc5.txt: from my NUC5 with AHCI and a Seagate hard disk.

  - dmesg.fujitsu.txt: from my Esprimo, with AHCI and wd2 (Seagate) and wd3 
(WD).

A few more notes:

  - On the NUC, I had intermediately and temporarily replaced the hard drive. In the 
process, the reproducibility of the problem suffered. Before I "moved" the 
cables, I could see the problem every time I booted. Now it's more of a coincidence that 
it happens (even with the original hard drive installed).

  - On the Esprimo - when the error occurs at almost every cold boot - 
according to my observations, both mechanical hard disks are always affected 
(wd2 and wd3). The SSDs (wd0 and wd1), on the other hand, are always detected 
correctly.

More generally, the state of the cabling seems to contribute at least somewhat 
to the problems. With the NUC, unplugging and plugging in changed the 
probability of occurrence. With the Fujitsu, I noticed the problems more since 
I installed a 4x SATA dock. That the problem is almost certainly related to the 
AHCI SATA delay would be judged by the fact that it only occurs with NetBSD 
9.99.x and not with 9.2 or FreeBSD/Linux.

Especially with the Fujitsu, however, I had already exchanged cables several times 
beforehand and tried different things, because I had initially suspected a pure cabling 
problem. However, it seems to me at the moment that the cabling at most changes the 
timing and this is set so "on edge" that the problem sometimes occurs and 
sometimes not.

Kind regards
Matthias


Am 24.05.2022 um 18:23 schrieb Rin Okuyama:
Hi,

The recent change for probe timing should only affect ahcisata(4).
Is your SATA controller ahcisata(4)? If so,

(1) please try kernel built with:

---
options AHCISATA_EXTRA_DELAY
---

If it works around the problem,

(2) please send us full dmesg of your machine.

Then, we can add your controller to the quirk list. At once it is
registered to the list, AHCISATA_EXTRA_DELAY option is no longer
required.

Thanks,
rin

On 2022/05/25 0:49, Matthias Petermann wrote:
A small addendum: disabling the Intel Platform Trust technology in the BIOS did 
not help me (had read this in another post of the linked thread).

However, by plugging in additional USB devices (a mouse) I apparently caused the 
necessary delay, which the disk would have needed in the first case to execute the 
WDCTL_RST without errors. This "workaround" is a shaky one though, an extremely 
close call. I don't even want to think about what I would do to a production server if 
this happened to me on a reboot.

Kind regards
Matthias


Am 24.05.2022 um 17:31 schrieb Matthias Petermann:

Hello all,

with one of the newer builds of 9.99 (unfortunately I can't narrow it down 
more) I have a problem on a NUC5 with a Seagate Firecuda SATA hard drive 
(hybrid HDD/SSD).

As long as I boot from the USB stick (for installation, as well as later for 
booting the kernel with root redirected to the wd0) the hard drive wd0 is 
recognized correctly and works without problems.

When I boot directly from the wd0 hard drive, I get through the boot loader 
fine, which also still loads the kernel correctly into memory. However, when 
running the initialization or hardware detection, there is then a problem with 
the initialization of wd0:

```
WDCTL_RST failed for drive 0
wd0: IDENTIFY failed
```

The error pattern seems to be not quite rare and probably the closest to it is 
this post:

http://mail-index.netbsd.org/current-users/2022/03/01/msg042073.html

Recent changes to the SATA autodetection timing are mentioned there. This would 
fit my experience, since I had the problem neither with 9.1 (build from 
02/16/2021) nor with older 9.99 versions. Does anyone know more specifics about 
this timing thing, as well as known workarounds if there are any? I have 
several NUC5s with exactly this model of hard drive running stably for several 
years - it would be a shame if I now have to replace them for such a reason.

Many greetings
Matthias

Reply via email to