Re: [OT] poll() vs. AIO (was: [PATCH] ash: clear NONBLOCK flag from stdin when in foreground)

Rich Felker Sun, 21 Aug 2011 07:41:56 -0700

On Sun, Aug 21, 2011 at 08:07:08AM +0200, Laurent Bercot wrote:
> 
> > A library cannot use popen without documenting this so the caller
> > knows. This is because a caller that's not expecting to have child
> > processes except the ones it creates might install a signal handler
> > for SIGCHLD that just immediately reaps all children if it doesn't
> > care about their exit status.
> 
> [...]
>  If someone has trouble using a library that forks, the problem does not
> lie in the library, but in the SIGCHLD handler.


I'd like to know how else, other than this ugly SIGCHLD handler, you
can create a child process without having the keep track of the pid
and have part of your code be responsible for waiting for it.
Especially if any process in the program could have created the child
process...

> > You might call such calling programs badly coded, but it's a common
> > idiom.
> 
>  Not so common. Unless they're all coded by people who believe that
> starting all their programs by blocking all signals except a few chosen
> ones is a good programming practice, see what I mean ? ;)
> 
> 
> > The alternative, if you want to make a "detached child process"
> > (one you don't have to keep track of and later wait for) is to "double
> > fork"
> 
>  It is a possibility, but it is not necessary.
>  Also, the cost of fork() is not as high as you pretend it is. With COW,
> very few pages are copied at fork() invocation, and if the child dies or
> execs soon afterwards, no more copying happens. On Linux, fork() and
> pthread_create() are even based on the *same* system call, clone(), so
> the difference is really small if fork() does not have that much data
> to duplicate.

I've timed it. fork() takes as least twice as long as pthread_create
and that's including the time pthread_create spends on mmap, mprotect,
etc. Double-fork therefore takes 4 times as long. COW does not save
you from copying the page tables, nor from the cost to commit charge.
This has nontrivial time and memory cost in a large application that
wants to spawn new threads/processes to handle a new io channel or a
command that can't be completed immediately (which was the original
usage case that led to this discussion).

> > In short, processes are *not* easy to use. They have ugly corner
> > cases like this all over the place.
> 
>  Oh, please. Your arguments against multiprocessing are that processes
> are tricky to use and have ugly corner cases, and instead you are
> advocating the use of *threads* ? We must not be living in the same world.
> In my world, processes are MUCH easier to use than threads ! The
> inconveniences you mentioned are *nothing* compared to the headaches of
> properly synchronizing your threads, locking exactly what must be locked
> for the necessary duration, etc.

Complex locking is rarely required or useful. Use of threads just like
multiprocessing will have no locks, unless it uses semaphores or
condition variables as high-performance replacements for
filesystem-based or SysV-style IPC. Even in programs that need
locking, in *most cases* well-written code does not hold a lock for
more than a few lines of code, and does not hold multiple locks at
once.

>  When you use processes, the kernel protects you. It's on your side.
> The system calls are designed so that a sequence that should intuitively
> work actually works most of the time; there is no need for explicit
> synchronization, and if your shared data is accessed through the
> filesystem, file locking is easy to use.
>  You have no such protection from the kernel when you use threads. You
> are on your own. Everything is fair game. Talk about ugly corner cases !

I agree there are some cases (complex handling of third-party data)
where these benefits (security) greatly outweigh the advantages of
threads -- but only if you actually use real sandboxes with unique
uids or that Linux extension that disables all syscalls except
pure-computation-related ones. With that said, I don't write or
advocate writing code that relies on SIGSEGV to cover up its bugs...

>  I guess most of it comes from practice and habit. I am very much used
> to multiprocess programming, and find it easy. You are probably very much
> used to multithread programming, that's why you find it easy.

Actually not. As I said when this discussion started, I used to
strongli dislike threads, but always got stung by issues with
multiprocess programming. So I went and really learned pthreads
(mostly by implementing it) and found it a much saner, cleaner
interface.

> > That does not mean you have to use it at all. It's completely possible
> > to write a multi-threaded program where each thread never accesses any
> > object except its own automatic variables and memory it allocated
> > itself with malloc.
> 
>  Then you might as well use separate processes, and have the kernel
> guarantee your threads won't walk all over one another, at the cost of
> a little more memory and a little more CPU time at thread creation.

I already explained how this could lead to *unbounded* memory growth
through leaks that are nearly impossible to fix, unless all your
forking happens from a common parent that doesn't grow.

Rich
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Re: [OT] poll() vs. AIO (was: [PATCH] ash: clear NONBLOCK flag from stdin when in foreground)

Reply via email to