Thanks very much, Chuck -- this patch fixed my problem. I noticed you removed a couple of KASSERTs -- shouldn't those be cases be EVEN MORE true now than they were before? Given what I debugged, I'm wondering if the asserts would help make sure future code doesn't end up trying to do something similar in the future...
Rob > On Feb 7, 2020, at 4:31 PM, Chuck Silvers <[email protected]> wrote: > > On Thu, Feb 06, 2020 at 04:31:47PM -0800, Rob Newberry wrote: >> Hi. >> >> I spent last weekend -- and a few days this week -- tracking down a problem >> that exists in current. >> I found a workaround, but I don't know what the "proper" fix is. >> Digging through the VM layer and debugging with printfs was slow -- >> and it's a boot-time issue, so I had to swap a lot of SD cards back and >> forth :-). >> Hopefully someone here is better at this than me. >> >> >> [analysis...] > > good job working your way through all that, this code is pretty complicated. > > >> 3) Start "aiodone_queue" earlier in the sequence. I don't have a rich >> enough understanding of >> this part of the kernel and user land startup process to know how hard this >> is, or how hacky it is. > > this is the right way to fix it. please try the attached patch. > > >> BTW, I'm ASSUMING that if uvm.aiodone_queue were present, the asynchronous >> completion would somehow >> handle marking the pages as "not busy". But I actually never debugged that >> code path, >> so I can't be sure that's helpful. > > right, the "aiodone_queue" workqueue will call uvm_aiodone_worker() on the > buffer, > and bp->b_iodone will have been set to uvm_aio_aiodone, which unbusies the > pages > among other things. > > -Chuck > <diff.aiodone_queue.1>
