Yves-Alexis Perez wrote:
> since kernel 4.11 (sorry it took so long to report) I have a box
> failing to boot with a NULL pointer dereference (the box is stuck
> there afterwards).
I get the same result on a Quanta server with several 4.13 and 4.14
kernels (from the Ubuntu "mainline" and Xenial hwe-edge PPAs).
This (I guess) problem had been reported by Stefan Priebe under
"isci regression in 4.11.0-rc2 by scsi: libsas: allow async aborts"
on 8 November, 2017[1]. That report didn't elicit any response here.
> The bug has also been reported to the Debian BTS ([2]) and a
> suggestion to revert 90965761 has been made. I can confirm it fix the
> boot issue.
The Debian people have implemented the suggestion to revert 90965761 as
of their 4.14.12-1 kernel package[2].
> I don't have the complete stack trace at hand but there's an example
> in the Debian bug.
Here's a stack trace from my server. It was copied and pasted from a
serial console (IPMI SOL), I hope it's complete.
[ 9.184043] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 9.184055] IP: isci_task_abort_task+0x43/0x400 [isci]
[ 9.184056] PGD 0
[ 9.184056] P4D 0
[ 9.184057]
[ 9.184058] Oops: 0000 [#1] SMP
[ 9.184060] Modules linked in: aesni_intel(+) aes_x86_64 crypto_simd
glue_helper cryptd mei_me intel_cstate intel_rapl_perf mei shpchp lpc_ich
ipmi_si(+) mac_hid kvm_intel kvm irqbypass ib_iser rdma_cm iw_cm ib_cm ib_core
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf
ipmi_msghandler autofs4 btrfs xor raid6_pq ast ttm drm_kms_helper ixgbe igb
syscopyarea isci sysfillrect i2c_algo_bit dca sysimgblt libsas fb_sys_fops ptp
mdio drm scsi_transport_sas pps_core wmi
[ 9.184084] CPU: 18 PID: 434 Comm: kworker/u48:1 Not tainted
4.13.0-21-generic #24~16.04.1-Ubuntu
[ 9.184084] Hardware name: Quanta S210-X12RS V2/S210-X12RS V2, BIOS
S2RQ4A08 08/12/2013
[ 9.184090] Workqueue: scsi_tmf_0 scmd_eh_abort_handler
[ 9.184091] task: ffff96507bb05d00 task.stack: ffffa2de87bb4000
[ 9.184095] RIP: 0010:isci_task_abort_task+0x43/0x400 [isci]
[ 9.184095] RSP: 0018:ffffa2de87bb7c88 EFLAGS: 00010246
[ 9.184096] RAX: 0000000000000000 RBX: ffff9650782f11a8 RCX:
0000000000000000
[ 9.184097] RDX: 0000000000000000 RSI: ffff9650782f11a8 RDI:
0000000000000000
[ 9.184097] RBP: ffffa2de87bb7e28 R08: 0000000000000000 R09:
0000000000000001
[ 9.184098] R10: 000000000000b8cb R11: 00000000000002f3 R12:
ffff9650782f1148
[ 9.184098] R13: ffff9650758cb800 R14: 0000000000000008 R15:
0000000000000000
[ 9.184099] FS: 0000000000000000(0000) GS:ffff9660bf380000(0000)
knlGS:0000000000000000
[ 9.184100] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.184100] CR2: 0000000000000000 CR3: 000000004b009000 CR4:
00000000001406e0
[ 9.184101] Call Trace:
[ 9.184107] ? cpumask_next_and+0x31/0x50
[ 9.184110] ? load_balance+0x1b5/0x9c0
[ 9.184114] ? sched_clock+0x9/0x10
[ 9.184116] ? sched_clock+0x9/0x10
[ 9.184117] ? sched_clock+0x9/0x10
[ 9.184120] ? sched_clock_cpu+0x11/0xb0
[ 9.184121] ? pick_next_task_fair+0x3c7/0x560
[ 9.184123] ? __switch_to+0x211/0x510
[ 9.184125] ? put_prev_entity+0x27/0x100
[ 9.184129] sas_eh_abort_handler+0x30/0x50 [libsas]
[ 9.184131] scmd_eh_abort_handler+0x74/0x230
[ 9.184135] process_one_work+0x156/0x410
[ 9.184136] worker_thread+0x4b/0x460
[ 9.184138] kthread+0x109/0x140
[ 9.184139] ? process_one_work+0x410/0x410
[ 9.184140] ? kthread_create_on_node+0x70/0x70
[ 9.184143] ret_from_fork+0x25/0x30
[ 9.184144] Code: 08 48 81 ec 78 01 00 00 c7 85 78 fe ff ff 00 00 00 00 c7
85 80 fe ff ff 00 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 <48> 8b
07 48 8b 40 30 48 8b 80 90 02 00 00 4c 8b a0 28 01 00 00
[ 9.184160] RIP: isci_task_abort_task+0x43/0x400 [isci] RSP:
ffffa2de87bb7c88
[ 9.184161] CR2: 0000000000000000
[ 9.184162] ---[ end trace bf9920b58fca631f ]---
> The machine is a Dell Precision T5600 with the following SATA
> controllers:
> 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA
> AHCI Controller (rev 05)
> 05:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset
> 4-Port
> SATA Storage Control Unit (rev 05)
Mine is a Quanta S210-X12RS server with only one SATA controller:
08:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset 4-Port
SATA Storage Control Unit (rev 05)
Connected to that SATA controller are two Samsung 850 EVO 250GB SSDs and
one 3TB WD Red disk.
> If you need more information or need me to test something, please ask.
Likewise.
Best regards,
--
Simon.
[1] https://marc.info/?l=linux-scsi&m=151013394701914
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=882414