https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=291059
Bug ID: 291059
Summary: CAM/CTL ioctl frontend fails to validate initiator ID
Product: Base System
Version: CURRENT
Hardware: Any
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: [email protected]
Reporter: [email protected]
While working on improving virtio-scsi, I accidentally screwed things up in the
bhyve userspace code, and that led to a panic in CTL:
```
Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex CTL LUN (CTL LUN) r = 0 (0xfffff816e1081800) locked @
/usr/src/sys/cam/ctl/ctl.c:12274
stack backtrace:
#0 0xffffffff80c0624c at witness_debugger+0x6c
#1 0xffffffff80c07460 at witness_warn+0x430
#2 0xffffffff810e1bec at trap_pfault+0x8c
#3 0xffffffff810b37c8 at calltrap+0x8
#4 0xffffffff8273cc47 at ctl_run+0x87
#5 0xffffffff82751f23 at ctl_ioctl_io+0x173
#6 0xffffffff80a09861 at devfs_ioctl+0xd1
#7 0xffffffff811ad061 at VOP_IOCTL_APV+0x51
#8 0xffffffff80cadcd0 at vn_ioctl+0x160
#9 0xffffffff80a09f2e at devfs_ioctl_f+0x1e
#10 0xffffffff80c0c201 at kern_ioctl+0x2a1
#11 0xffffffff80c0beff at sys_ioctl+0x12f
#12 0xffffffff810e2989 at amd64_syscall+0x169
#13 0xffffffff810b40bb at fast_syscall_common+0xf8
Fatal trap 12: page fault while in kernel mode
cpuid = 9; apic id = 09
fault virtual address = 0xfffffe023f28ea68
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff8273cfba
stack pointer = 0x28:0xfffffe019753e940
frame pointer = 0x28:0xfffffe019753ead0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 10033 (vtscsi:0-14)
rdi: 00000000000ba34d rsi: fffffe023ecbd000 rdx: 000000005d1a6400
rcx: 0000000000007f71 r8: 0000000000000001 r9: ffffffff81e54920
rax: 000000005d1a6c00 rbx: fffffe0200e5f000 rbp: fffffe019753ead0
r10: 0000000000000000 r11: 0000000000000001 r12: 0000000000000000
r13: fffffe0200e5f000 r14: fffff816e1081800 r15: ffffffff8276a0d0
trap number = 12
panic: page fault
cpuid = 9
time = 1763225206
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe019753e670
vpanic() at vpanic+0x136/frame 0xfffffe019753e7a0
panic() at panic+0x43/frame 0xfffffe019753e800
trap_pfault() at trap_pfault+0x47c/frame 0xfffffe019753e870
calltrap() at calltrap+0x8/frame 0xfffffe019753e870
--- trap 0xc, rip = 0xffffffff8273cfba, rsp = 0xfffffe019753e940, rbp =
0xfffffe019753ead0 ---
ctl_scsiio_precheck() at ctl_scsiio_precheck+0x31a/frame 0xfffffe019753ead0
ctl_run() at ctl_run+0x87/frame 0xfffffe019753eaf0
ctl_ioctl_io() at ctl_ioctl_io+0x173/frame 0xfffffe019753ebc0
devfs_ioctl() at devfs_ioctl+0xd1/frame 0xfffffe019753ec10
VOP_IOCTL_APV() at VOP_IOCTL_APV+0x51/frame 0xfffffe019753ec40
vn_ioctl() at vn_ioctl+0x160/frame 0xfffffe019753ecb0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe019753ecd0
kern_ioctl() at kern_ioctl+0x2a1/frame 0xfffffe019753ed40
sys_ioctl() at sys_ioctl+0x12f/frame 0xfffffe019753ee00
amd64_syscall() at amd64_syscall+0x169/frame 0xfffffe019753ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe019753ef30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x1fd2863b2fba, rsp =
0x1fd6dabfaf28, rbp = 0x1fd6dabfaf70 ---
KDB: enter: panic
[ thread pid 10033 tid 101113 ]
Stopped at kdb_enter+0x33: movq $0,0x121d032(%rip)
```
So the system panicked in `ctl_scsiio_precheck()`, which I would have assumed
was there to make sure that the SCSI I/O request sent to CTL is valid. Why
would it panic on an invalid I/O request?
The code leading up to the panic is this:
```
ctl_scsiio_precheck+0x2fe: movl 0xc(%rbx),%edx
ctl_scsiio_precheck+0x301: movl 0x10(%rbx),%eax
ctl_scsiio_precheck+0x304: shll $0xb,%eax
ctl_scsiio_precheck+0x307: addl %edx,%eax
ctl_scsiio_precheck+0x309: cmpl $0x3,%esi
ctl_scsiio_precheck+0x30c: jz ctl_scsiio_precheck+0x338
ctl_scsiio_precheck+0x30e: movq 0x90(%r14),%rsi
ctl_scsiio_precheck+0x315: movl %eax,%edi
ctl_scsiio_precheck+0x317: shrl $0xb,%edi
ctl_scsiio_precheck+0x31a: movq (%rsi,%rdi,8),%rsi
```
Going back to the register dump from the panic, we see that `%rsi` contains the
pointer to an array, and `%rdi` is used for indexing. Its value at the time was
0xba34d, which was 0x5d1a6800 + x before being shifted right by 0xb.
This is the source fragment where the panic happened:
```
initidx = ctl_get_initindex(&ctsio->io_hdr.nexus);
/*
* If we've got a request sense, it'll clear the contingent
* allegiance condition. Otherwise, if we have a CA condition for
* this initiator, clear it, because it sent down a command other
* than request sense.
*/
if (ctsio->cdb[0] != REQUEST_SENSE) {
struct scsi_sense_data *ps;
ps = lun->pending_sense[initidx / CTL_MAX_INIT_PER_PORT];
if (ps != NULL)
ps[initidx % CTL_MAX_INIT_PER_PORT].error_code = 0;
}
```
So we're getting `initidx` from the I/O header, and use it to index into
`lun->pending_sense`. With an index value of 0xba34d, we'll reach far beyond
the end of the array, and we can actually consider ourselves lucky that this
causes a panic right away.
This is what `ctl_get_initindex()` looks like:
```
uint32_t
ctl_get_initindex(struct ctl_nexus *nexus)
{
return (nexus->initid + (nexus->targ_port * CTL_MAX_INIT_PER_PORT));
}
```
So its really just calculating an index from `initid` and `targ_port` given in
the `ctl_nexus` structure. So let's look at the `ctsio` containing that
`ctl_nexus`, which we got from the ioctl call. From the disassembly we know
that its address is in `%r13`, which is 0xfffffe0200e5f000.
```
db> ex/x 0xfffffe0200e5f000,10
0xfffffe0200e5f000: 0 1 0
5d1a6400
```
Now, this looks familiar, doesn't it? The userspace code passed 0x51da6400 as
`initid` in `ctl_io->io_hdr.nexus`, apparently because I screwed up in the
bhyve code. But the kernel really should have validated this input from
userspace, making sure the initid is actually within reasonable limits, before
using it to form an index into an in-kernel array.
--
You are receiving this mail because:
You are the assignee for the bug.