unarchive 14752 On 28/05/14 08:15, Pádraig Brady wrote: > On 05/26/2014 10:10 PM, Pádraig Brady wrote: >> On 05/26/2014 10:00 PM, Azat Khuzhin wrote: >>>> So the issue here is that sort is allocating >>>> a large buffer up front thus impacting the fork(). >>>> Really sort(1) should be trying to avoid this issue >>>> in the first place, and the issue is already logged at: >>>> http://bugs.gnu.org/14752 >>> >>> Yes this is the same as I linked above. >>> Does any body have a patch for this, or should I start working on this? >> >> I was waiting for a patch that didn't materialize. >> I'll have a look myself now. > > So I had a look and the change while definitely worth doing > is a bit invasive and so probably not appropriate for the impending release, > as that's focusing on bug fixes rather than performance characteristics. > > Some implementation notes for reference... > > vfork() is portable only when one essentially just does an > execve() right after the vfork(). Therefore just for fire and forget > processes. > Anything where you need to interact with the sub process like setting up files > to communicate etc. is going to have portability issues. Even using execvp() > is problematic I understand. Also sort is multithreaded which might further > complicate things. > > Leveraging posix_spawn() is promising, as the main reason for that > interface is to provide an efficient fork()+exec() mechanism. > That can be implemented using vfork() or with clone() on Linux. > The implementation in glibc is only a token one however as it > just uses fork()+exec() usually. One can override with the > POSIX_SPAWN_USEVFORK flag, but there are many non obvious > implementation gotchas with doing that. Again being multithreaded > may complicate things here. Note the musl, freebsd, osx and solaris > posix_spawn() implementations are efficient, which would be > another reason to use this assuming the glibc/gnulib implementation > is fixed up.
Progress on the glibc posix_spawn() front: https://sourceware.org/ml/libc-alpha/2016-02/msg00016.html > > Another option would be for sort(1) to start up a helper child process, > before we allocate much memory. Then we could communicate descriptors > back and forth to that, and it could deal with forking the children. > That would be portable too, but a little involved. Ideally we could > keep the complication within posix_spawn() instead. > > Note this is a general issue not just related to sort(1). > Many servers for example whether written in java/python/ruby or whatever > have this issue when they use lots of memory and would like to > popen() something. So fixing up the glibc posix_spawn() implementation > would be very useful so that popen() etc. within glibc and the > various language runtimes could leverage. > > Some links for reference: > > http://www.oracle.com/technetwork/server-storage/solaris10/subprocess-136439.html > https://sourceware.org/ml/libc-help/2010-10/msg00001.html > https://sourceware.org/bugzilla/show_bug.cgi?id=10354 > https://sourceware.org/bugzilla/show_bug.cgi?id=378 > http://git.musl-libc.org/cgit/musl/tree/src/process/posix_spawn.c > http://ewontfix.com/7/ > http://stackoverflow.com/questions/2731531/faster-forking-of-large-processes-on-linux > http://stackoverflow.com/questions/8152076/spawn-process-from-multithreaded-application > https://github.com/rtomayko/posix-spawn/blob/master/ext/posix-spawn.c > http://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/ > http://code.google.com/p/popen-noshell/source/browse/trunk/popen_noshell.c