Hi.
I spent last weekend -- and a few days this week -- tracking down a problem
that exists in current. I found a workaround, but I don't know what the
"proper" fix is. Digging through the VM layer and debugging with printfs was
slow -- and it's a boot-time issue, so I had to swap a lot of SD cards back and
forth :-). Hopefully someone here is better at this than me.
Here's the background/setup:
- I'm using a Raspberry Pi 3B+ in aarch64 mode
- I'm building my own kernel, but that kernel is more or less the
GENERIC64 kernel
- I've added the "brcmfmac43455-sdio.[bin,txt]" files so that I can use
the built-in Wi-Fi
- I'm using a root file system on a RAM disk (I build a root file
system image and then put it on the SD card, and then use efiboot/bootaa64 to
load the RAM disk)
(I don't know if that's relevant, but I wondered if using a RAM
disk might have timing implications)
Here's the symptom:
- After most of the boot up, but before the login prompt is presented,
the system hangs. Can't break into DDB either.
Here's what's actually going wrong:
- The if_bwfm_sdio driver is loading the firmware, using the interfaces
in dev/firmload.
- The firmware for this device is pretty large (depending on which one
you use [another, separate problem I hope is resolved soon :-)], between
450K-600K)
- firmware_read calls vn_rdwr with a large request
- that calls VOP_READ, which calls ffs_read, which then does this:
- ubc_uiomove
- ubc_alloc
- breaks request of into 64K chunks
- completes OK
- uiomove
- memcpy 64K
==> generates a fault
- the first fault then gets handled:
- data_abort_handler
- uvm_fault_internal
- ubc_fault
- uvn_get
- uvm_ra_request
- NOTE: it's the first
time for this file, so no read ahead is
actually
posted, but because we got "no advice", it sets
one up for next
time
- completes
- VOP_GETPAGES: note it's
called with the SYNCIO flag
- genfs_getpages
(called with "async" set to false)
uvm_findpages
-
uvm_pagealloc_strat for many new pages
-
returns all the pages, now marked busy
genfs_getpages_read( async is false )
-
md_read (I'm using a RAM disk), called synchronously
completes without error
completes without error
for each page
- calls
ubc_fault_page
- marks each page as NOT BUSY
completes
without error
completes without error
completes without error
completes without error
completes without error
Everything's OK so far. But we're not done.
The ubc_uiomove has many more 64K chunks to page in. And in the next one, we
run into the problem.
When uiomove calls memcpy for the second chunk, the fault goes like this:
- data_abort_handler
- uvm_fault_internal
- ubc_fault
- uvn_get
- uvm_ra_request
- the previous fault set up a
read-ahead request, so now we post it
- and it gets posted
from where we currently are (offset = 64K)
- ra_startio
- VOP_GETPAGES: but
THIS TIME it's called WITHOUT the SYNCIO flag, which means
-
genfs_getpages: called with "async" set to TRUE
-
uvm_findpages:
- uvm_pagealloc_strat for many new pages
- returns all the pages, now marked busy
-
genfs_getpages_read( async is TRUE)
- BUT WAIT!
**** uvm.aiodone_queue is NULL -- IT HAS NOT BEEN STARTED YET ****
**** the "async" flag gets reset to false for this routine ****
md_read is called synchronously
- completes without error
- completes without error
- completes
without error:
BUT:
**** ALL THESE PAGES ARE STILL MARKED BUSY ****
- completes
without error
- completes without
error
next, uvn_get calls:
- VOP_GETPAGES: this time, it's called
with the SYNCIO flag
- genfs_getpages (called with
"async" set to false)
uvm_findpages
- finds the
first page at 64K
- BUT THE PAGE
IS MARKED BUSY (no one has marked then "un busy" yet -- and no-one will)
- decides to
wait
- it will wait
forever, because the previous call is already complete, and
there's
no aiodone queue processing anyway
Given all that, and the fact that I'm by no means an expert in any of this code
(and the last thing I'd want to muck up is the VM layer), here are some
work-around ideas. But I'd certainly love someone to provide a real fix.
1) The SIMPLEST, easiest-to-apply workaround is to change firmload to pass
"IO_ADV_ENCODE(POSIX_FADV_RANDOM)" to vn_rdwr. This forces the lower layer to
avoid posting read-ahead requests, which prevents the problem from happening.
But it's REALLY hacky as a way to avoid the problem, and it leaves the
underlying bug there for something else to hit later.
--- a/sys/dev/firmload.c
+++ b/sys/dev/firmload.c
@@ -317,7 +317,7 @@ firmware_read(firmware_handle_t fh, off_t offset,
void *buf, size_t len)
{
return (vn_rdwr(UIO_READ, fh->fh_vp, buf, len, offset,
- UIO_SYSSPACE, 0, kauth_cred_get(), NULL,
curlwp));
+ UIO_SYSSPACE, IO_ADV_ENCODE(POSIX_FADV_RANDOM),
kauth_cred_get(), NULL, curlwp));
}
2) Change the read-ahead code to short-circuit if uvm.adiodone_queue is NULL.
This also leaves an underlying bug in place, but this still might be the right
thing to do. If uvm.aiodone_queue is NULL, then the read-ahead code just
generates page faults which are handled synchronously by the lower-layer
(genfs_getpages_read switches to sync mode on its own), so it doesn't seem like
there's a performance boost you're eliminating -- the "read-ahead" just happens
synchronously before any actual read happens.
3) Start "aiodone_queue" earlier in the sequence. I don't have a rich enough
understanding of this part of the kernel and user land startup process to know
how hard this is, or how hacky it is.
BTW, I'm ASSUMING that if uvm.aiodone_queue were present, the asynchronous
completion would somehow handle marking the pages as "not busy". But I
actually never debugged that code path, so I can't be sure that's helpful.
Of course, someone who understands the VM layer can likely dig in and figure
out what the RIGHT fix is. It seems to me like the fact that
genfs_getpages_read decides to operate synchronously on its own might need to
be detected by the other layers, as they are currently behave as if their async
requests will be handled asynchronously -- and they're not. But honestly, I
never figured out if that would change things -- I never debugged the case
where the requests actually WERE being handled asynchronously.
I also don't know why others aren't seeing this -- although maybe they are. I
don't know how many people are trying out Wi-Fi on the 3B+. Without the
firmware files in the tree, maybe no ones really doing much with them yet.
Hope this is helpful.
Rob