Thanks very much, Chuck -- this patch fixed my problem.

I noticed you removed a couple of KASSERTs -- shouldn't those be cases be EVEN 
MORE true now than they were before?  Given what I debugged, I'm wondering if 
the asserts would help make sure future code doesn't end up trying to do 
something similar in the future...

Rob





> On Feb 7, 2020, at 4:31 PM, Chuck Silvers <[email protected]> wrote:
> 
> On Thu, Feb 06, 2020 at 04:31:47PM -0800, Rob Newberry wrote:
>> Hi.
>> 
>> I spent last weekend -- and a few days this week -- tracking down a problem 
>> that exists in current.
>> I found a workaround, but I don't know what the "proper" fix is.
>> Digging through the VM layer and debugging with printfs was slow --
>> and it's a boot-time issue, so I had to swap a lot of SD cards back and 
>> forth :-).
>> Hopefully someone here is better at this than me.
>> 
>> 
>> [analysis...]
> 
> good job working your way through all that, this code is pretty complicated.
> 
> 
>> 3) Start "aiodone_queue" earlier in the sequence.  I don't have a rich 
>> enough understanding of
>> this part of the kernel and user land startup process to know how hard this 
>> is, or how hacky it is.
> 
> this is the right way to fix it.  please try the attached patch.
> 
> 
>> BTW, I'm ASSUMING that if uvm.aiodone_queue were present, the asynchronous 
>> completion would somehow
>> handle marking the pages as "not busy".  But I actually never debugged that 
>> code path,
>> so I can't be sure that's helpful.
> 
> right, the "aiodone_queue" workqueue will call uvm_aiodone_worker() on the 
> buffer,
> and bp->b_iodone will have been set to uvm_aio_aiodone, which unbusies the 
> pages
> among other things.
> 
> -Chuck
> <diff.aiodone_queue.1>

Reply via email to