uvm/busy page deadlock in current (related to loading Raspberry Pi 3B+ Wi-Fi firmware, but more of a timing bug with the VM system)

Rob Newberry Thu, 06 Feb 2020 18:08:18 -0800

Hi.

I spent last weekend -- and a few days this week -- tracking down a problem 
that exists in current.  I found a workaround, but I don't know what the 
"proper" fix is.  Digging through the VM layer and debugging with printfs was 
slow -- and it's a boot-time issue, so I had to swap a lot of SD cards back and 
forth :-).  Hopefully someone here is better at this than me.



Here's the background/setup:

        - I'm using a Raspberry Pi 3B+ in aarch64 mode

        - I'm building my own kernel, but that kernel is more or less the 
GENERIC64 kernel

        - I've added the "brcmfmac43455-sdio.[bin,txt]" files so that I can use 
the built-in Wi-Fi

        - I'm using a root file system on a RAM disk (I build a root file 
system image and then put it on the SD card, and then use efiboot/bootaa64 to 
load the RAM disk)

                (I don't know if that's relevant, but I wondered if using a RAM 
disk might have timing implications)

Here's the symptom:

        - After most of the boot up, but before the login prompt is presented, 
the system hangs.  Can't break into DDB either.

Here's what's actually going wrong:

        - The if_bwfm_sdio driver is loading the firmware, using the interfaces 
in dev/firmload.

        - The firmware for this device is pretty large (depending on which one 
you use [another, separate problem I hope is resolved soon :-)], between 
450K-600K)

        - firmware_read calls vn_rdwr with a large request

        - that calls VOP_READ, which calls ffs_read, which then does this:

                - ubc_uiomove
                        - ubc_alloc
                                - breaks request of into 64K chunks
                                - completes OK
                        - uiomove
                                - memcpy 64K
                                        ==> generates a fault

        - the first fault then gets handled:

                - data_abort_handler
                        - uvm_fault_internal
                                - ubc_fault
                                        - uvn_get
                                                - uvm_ra_request
                                                        - NOTE: it's the first 
time for this file, so no read ahead is
                                                                actually 
posted, but because we got "no advice", it sets 
                                                                one up for next 
time
                                                        - completes
                                                - VOP_GETPAGES: note it's 
called with the SYNCIO flag
                                                        - genfs_getpages 
(called with "async" set to false)
                                                                uvm_findpages
                                                                        - 
uvm_pagealloc_strat for many new pages
                                                                        - 
returns all the pages, now marked busy
                                                                
genfs_getpages_read( async is false )
                                                                        - 
md_read (I'm using a RAM disk), called synchronously
                                                                                
completes without error
                                                                        
completes without error
                                                                for each page
                                                                        - calls 
ubc_fault_page
                                                                                
- marks each page as NOT BUSY
                                                                completes 
without error
                                                        completes without error
                                                completes without error
                                        completes without error
                                completes without error

Everything's OK so far.  But we're not done.

The ubc_uiomove has many more 64K chunks to page in.  And in the next one, we 
run into the problem.  

When uiomove calls memcpy for the second chunk, the fault goes like this:

        - data_abort_handler
                - uvm_fault_internal
                        - ubc_fault
                                - uvn_get
                                        - uvm_ra_request
                                                - the previous fault set up a 
read-ahead request, so now we post it
                                                        - and it gets posted 
from where we currently are (offset = 64K)
                                                - ra_startio
                                                        - VOP_GETPAGES: but 
THIS TIME it's called WITHOUT the SYNCIO flag, which means
                                                                - 
genfs_getpages: called with "async" set to TRUE
                                                                        - 
uvm_findpages:
                                                                                
- uvm_pagealloc_strat for many new pages
                                                                                
- returns all the pages, now marked busy
                                                                        - 
genfs_getpages_read( async is TRUE)
                                                                                
- BUT WAIT!
                                                                                
**** uvm.aiodone_queue is NULL -- IT HAS NOT BEEN STARTED YET ****
                                                                                
**** the "async" flag gets reset to false for this routine ****
                                                                                
md_read is called synchronously
                                                                                
        - completes without error
                                                                                
- completes without error
                                                                - completes 
without error:
                                                                        BUT:
                                                                                
**** ALL THESE PAGES ARE STILL MARKED BUSY ****
                                                                - completes 
without error
                                                        - completes without 
error

                                        next, uvn_get calls:

                                        - VOP_GETPAGES: this time, it's called 
with the SYNCIO flag
                                                - genfs_getpages (called with 
"async" set to false)
                                                        uvm_findpages
                                                                - finds the 
first page at 64K
                                                                - BUT THE PAGE 
IS MARKED BUSY (no one has marked then "un busy" yet -- and no-one will)
                                                                - decides to 
wait
                                                                - it will wait 
forever, because the previous call is already complete, and
                                                                        there's 
no aiodone queue processing anyway



Given all that, and the fact that I'm by no means an expert in any of this code 
(and the last thing I'd want to muck up is the VM layer), here are some 
work-around ideas.  But I'd certainly love someone to provide a real fix.

1) The SIMPLEST, easiest-to-apply workaround is to change firmload to pass 
"IO_ADV_ENCODE(POSIX_FADV_RANDOM)" to vn_rdwr.  This forces the lower layer to 
avoid posting read-ahead requests, which prevents the problem from happening.  
But it's REALLY hacky as a way to avoid the problem, and it leaves the 
underlying bug there for something else to hit later.

        --- a/sys/dev/firmload.c
        +++ b/sys/dev/firmload.c
        @@ -317,7 +317,7 @@ firmware_read(firmware_handle_t fh, off_t offset, 
void *buf, size_t len)
         {
 
                return (vn_rdwr(UIO_READ, fh->fh_vp, buf, len, offset,
        -                       UIO_SYSSPACE, 0, kauth_cred_get(), NULL, 
curlwp));
        +                       UIO_SYSSPACE, IO_ADV_ENCODE(POSIX_FADV_RANDOM), 
kauth_cred_get(), NULL, curlwp));
         }



2) Change the read-ahead code to short-circuit if uvm.adiodone_queue is NULL.  
This also leaves an underlying bug in place, but this still might be the right 
thing to do.  If uvm.aiodone_queue is NULL, then the read-ahead code just 
generates page faults which are handled synchronously by the lower-layer 
(genfs_getpages_read switches to sync mode on its own), so it doesn't seem like 
there's a performance boost you're eliminating -- the "read-ahead" just happens 
synchronously before any actual read happens.

3) Start "aiodone_queue" earlier in the sequence.  I don't have a rich enough 
understanding of this part of the kernel and user land startup process to know 
how hard this is, or how hacky it is.

BTW, I'm ASSUMING that if uvm.aiodone_queue were present, the asynchronous 
completion would somehow handle marking the pages as "not busy".  But I 
actually never debugged that code path, so I can't be sure that's helpful.

Of course, someone who understands the VM layer can likely dig in and figure 
out what the RIGHT fix is.  It seems to me like the fact that 
genfs_getpages_read decides to operate synchronously on its own might need to 
be detected by the other layers, as they are currently behave as if their async 
requests will be handled asynchronously -- and they're not.  But honestly, I 
never figured out if that would change things -- I never debugged the case 
where the requests actually WERE being handled asynchronously.

I also don't know why others aren't seeing this -- although maybe they are.  I 
don't know how many people are trying out Wi-Fi on the 3B+.  Without the 
firmware files in the tree, maybe no ones really doing much with them yet.

Hope this is helpful.

Rob

uvm/busy page deadlock in current (related to loading Raspberry Pi 3B+ Wi-Fi firmware, but more of a timing bug with the VM system)

Reply via email to