I've now committed my fixes for NVMe driver, should be more stable
now, give it a try.

With those fixes, the driver works without any problem, even under
fairly heavy i/o load, when nvme.c and ld_nvme.c is compiled with -O0,
on both virtual and real MP machine. -O2 kernel works also on virtual
machine, but I've had an I/O lockup on real hw machine with -O2
kernel. It may have been unrelated, I'm still investigating.

Jaromir

2016-10-18 22:01 GMT+02:00 Jaromír Doleček <jaromir.dole...@gmail.com>:
> Hey,
>
> thank you. This iostat_unbusy panic is typical symptom of the current
> MP issues, the command completion queue gets corrupted, and
> nvme_q_complete() delivers some commands twice. It causes either this
> panic (due to duplicate lddone() for stale buf), or a random kernel
> crash.
>
> I've been working on debugging this for past two weeks or so. I have
> some local changes (mainly some volatile classifiers) which seem to
> fix this issue at least for my MP VirtualBox test machine. But these
> changes still do not fix the issue completely on another real system I
> have access to. I guess it would be useful to share the ongoing work
> at least. I'll polish and commit what I have, today or tomorrow.
>
> Jaromir
>
> 2016-10-18 10:40 GMT+02:00 Masanobu SAITOH <msai...@execsw.org>:
>> On 2016/09/22 5:54, Jaromír Doleček wrote:
>>>
>>> Hello,
>>>
>>> NVMe driver in NetBSD-current was recently tweaked to fix several MP and
>>> locking
>>> issues, and the driver is now marked as MPSAFE by default.
>>>
>>> Most of this work was done on emulators since I lack the the hardware,
>>> so it's not clear if
>>> everything would work properly on real systems too.
>>>
>>> Anyone having the hardware, I'd appreciate if you could check the
>>> driver out, and try
>>> to punish the drive by some heavy I/O test with parallel load if
>>> possible, and report
>>> results.
>>>
>>> The driver should work on i386 and amd64, and is enabled in
>>> INSTALL/GENERIC kernels there,
>>> so you could just try to boot install iso from NetBSD daily builds,
>>> and send-pr any
>>> issues.
>>>
>>> I'd also especially welcome if someone with sparc64 system could test
>>> the driver out, too.
>>> The driver originates from OpenBSD where nvme(4) is enabled in GENERIC
>>> sparc64
>>> kernel, so it should work. But it was not confirmed yet on
>>> NetBSD/sparc64. Note you might
>>> need fairly modern system, at least some Intel NVMe cards require PCIe
>>> Generation 3 to
>>> actually work, so this rules out e.g. T1s.
>>>
>>> I'd also very welcome any benchmark results, it would be very
>>> interesting to share some
>>> IOPS figures.
>>>
>>> Let me know the results, I'd like to update driver manpage to list
>>> known working hardware.
>>>
>>> In any reports, please include the attachment fragment from dmesg, as
>>> there
>>> is quite significant different between attachment via apic/INTx and
>>> MSI/MSI-X.
>>> Also useful would be intrctl(8) output, to confirm interrupt handlers
>>> are dispatched
>>> properly to individual available CPUs.
>>>
>>> Thank you.
>>>
>>> Jaromir
>>>
>>
>> With nvme.c rev. 1.16:
>>
>>> Oct 18 17:14:02 five savecore: reboot after panic: panic:
>>> ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts  SLPOyLW E RN
>>
>>
>> and,
>>
>>> five# crash -M netbsd.36.core -N /netbsd
>>> Crash version 7.99.39, image version 7.99.39.
>>> System panicked: iostat_unbusy
>>> Backtrace from time of crash is available.
>>> crash> trace
>>> _KERNEL_OPT_NVGA_RASTERCONSOLE() at 0
>>> ?() at ffff80008f0e5240
>>> vpanic() at vpanic+0x149
>>> snprintf() at snprintf
>>> iostat_isbusy() at iostat_isbusy
>>> dk_done1() at dk_done1+0xab
>>> lddone() at lddone+0xf
>>> nvme_q_complete() at nvme_q_complete+0xc6
>>> softint_dispatch() at softint_dispatch+0xd3
>>> DDB lost frame for Xsoftintr+0x4f, trying 0xfffffe810e919ff0
>>> Xsoftintr() at Xsoftintr+0x4f
>>> --- interrupt ---
>>> 0:
>>
>>
>> Again, the panic message was:
>>
>>> Oct 18 17:14:02 five savecore: reboot after panic: panic:
>>> ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts  SLPOyLW E RN
>>
>>
>> -> panic: iostat_unbust
>> -> WARNINWG:A RSNPILN GNO:T  SLPOLW E RN
>>
>>   -> WARNING: SPL NOT LOWER
>>   -> WARNING: SPL N
>>
>> The full dmesg is at:
>>
>>         http://www.netbsd.org/~msaitoh/nvme-20161018-0.log
>>
>> Any test code are welcomed!
>>
>> --
>> -----------------------------------------------
>>                 SAITOH Masanobu (msai...@execsw.org
>>                                  msai...@netbsd.org)

Reply via email to