Hello up there. I've found this thread 
via https://github.com/golang/go/issues/19563#issuecomment-287920797 so let 
me chime in and comment a bit:

1. The problem of sys_dup2 blocking is not FUSE-only. By definition dup2 
may end up closing newfd:

    If the file descriptor newfd was previously open, it is silently closed 
before being reused.
    (http://man7.org/linux/man-pages/man2/dup2.2.html)

so this will become blocking operation with any filesystem that does 
blocking work on .flush. One example of such "real" filesystem is NFS:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/nfs/file.c?id=97da3854c526d3a6ee05c849c96e48d21527606c#n149
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/nfs/file.c?id=97da3854c526d3a6ee05c849c96e48d21527606c#n836
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/open.c?id=97da3854c526d3a6ee05c849c96e48d21527606c#n1108
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/file.c?id=97da3854c526d3a6ee05c849c96e48d21527606c#n858

2. Please see below inline:

понедельник, 23 марта 2015 г., 3:13:47 UTC+3 пользователь Aaron Jacobs 
написал:
>
> By the way, for the benefit of anybody finding this thread in the future: 
>
> I found a similar problem with taking a page fault for an mmap'd file 
> implemented in fuse. The kernel blocks with this stack: 
>
>     [<0000000000000000>] wait_answer_interruptible+0x6a/0xa0 
>     [<0000000000000000>] __fuse_request_send+0x1fb/0x280 
>     [<0000000000000000>] fuse_request_send+0x12/0x20 
>     [<0000000000000000>] fuse_readpage+0x152/0x1e0 
>     [<0000000000000000>] filemap_fault+0x116/0x410 
>     [<0000000000000000>] __do_fault+0x6f/0x530 
>     [<0000000000000000>] handle_mm_fault+0x482/0xf00 
>     [<0000000000000000>] __do_page_fault+0x184/0x560 
>     [<0000000000000000>] do_page_fault+0x1a/0x70 
>     [<0000000000000000>] page_fault+0x28/0x30 
>     [<0000000000000000>] 0xffffffffffffffff 
>
> and my test deadlocks with the faulting thread camping on the Go scheduler 
> slot 
> and the fuse thread waiting for the scheduler slot. (The workaround in the 
> particular case of this test is to just raise GOMAXPROCS, of course.) 
>
> It seems like this probably comes up for files mmap'd from a "real" file 
> systems, too: while waiting on disk access for a page fault, the scheduler 
> will 
> fail to utilize the CPU.


Yes, it can come up with any filesystem because on pagefault the filesystem 
might need to do its loading work which becomes blocking operation. What 
also can happen is that if there will be IO error for loading, the kernel 
wil send SIGBUS as result to client code (which golang converts to runtime 
panics if runtime.debug.SetPanicOnFault() was set true).
 

> But I don't see anything obvious that can be done about it.
>

I came to conclusion the following has to be done around mmap access:

1. protect mmap-area accessing code with debug.SetPanicOnFault(true) to 
catch IO errors
2. wrap actual access with runtime·entersyscall and runtime·exitsyscall to 
let go scheduler know code might be blocking and free up a P slot.

This should look somthing like this (but not yet tested):

   ---- 8< ----
   err = nil
   save := debug.SetPanicOnFault(true)
   defer func() {
       if r := recover(); r != nil && r ~= "invalid memory address or nil 
pointer dereference ..." {
            err = EIO
       }
       runtime·exitsyscall()
       debug.SetPanicOnFault(save)
   }()

   runtime·entersyscall()
   copy(mmapedAddr[pos:pos+length], buf)
   return err
   ---- 8< ----

Thanks,
Kirill

Btw, one can see the cost of 1 minor pagefalt to be ~ 6x time _more_ cost 
of 1 page copy here:

http://marc.info/?l=linux-kernel&m=149002565506733&w=2

P.S. maybe sorry for formatting - replied via web.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to