Hello Mark, Thank you for your detail investigation and good report. "Mark A. Grondona": > I don't know aufs very well at all, but I don't see how it prevents > a race when multiple threads enter aufs_nopage() at the same time. > (Is this even possible? I can't tell for sure from my tracing)
I could see the parallel page fault for the same file (probably a shared object library) in a multi-threaded application too. > We have tried the latest aufs (20080512) as of this morning, and > the problem does not seem to reproduce, but I can't understand why, > since there was no change in aufs_nopage() (though there was a > suspicious un-comment of memory barrier in the unused aufs_fault()). I forgot uncommenting the memory barrier in aufs_nopage() for older kernels. I intended to use it in this week's release. Sorry. Have you already tried uncommenting the memory barrier (without the serialization by mmap_sem)? And still the problem happens? > So, here is the trace for the failing case: > > 0 clomp5_wopr7(6973):->aufs_nopage (vma=0xffff8101ffcd28c8 > file=0xffff8101feb044c0(nfs)/:lib64/libm-2.5.so addr=0x00002aaaaaeef000) > 46 clomp5_wopr7(6973): ->au_fi > (file=0xffff8101feb04dc0[(aufs)/:lib64/libm-2.5.so], > h_file=0xffff8101feb04dc0[(aufs)/:lib64/libm-2.5.so] Is this line correct? The h_file should be nfs instead of aufs. > Note that when the vma enters aufs_nopage(), its vm_file seems to be > pointing to the underlying nfs file? Is this because another thread > is racing against aufs_nopage() as well? Note also when filemap_nopage() I think the race condition happend as you guessed. And aufs_nopage() supports this condition by waiting for the vm_file is reverted. I hope the memory barrier which I forgot to uncomment will help you. Junjiro Okajima ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
