On 08/29/2012 09:50 PM, Assaf Gordon wrote: > Hello, > > I'd like to suggest a new feature to sort: the ability to set the buffer size > (-S/--buffer-size X) using an environment variable. > > In summary: > $ export SORT_BUFFER_SIZE=20G > $ someprogram | sort -k1,1 > output.txt > # sort will use 20G of RAM, as if "--buffer-size 20G" was specified. > > > The rational: > recent commits improved the guessed buffer size when sort is given an input > file, > but these don't apply if sort is used as part of a pipe line, with a pipe as > input, e.g. > some | program | sort | other | programs > file > > (Tested with v8.19 on linux 2.6.32, sort consumes few MBs of RAM, even though > many GBs are available). > This results in many small temporary files being created. > > The script (which uses sort) is not under my direct control, but even if it > was, > I don't want to hard-code the amount of memory used, to keep it portable to > different servers. > > AFAIK, there are four aspects of sort the affect performance: > 1. number of threads: > changeable with "--parallel=X" and with environment variable OMP_NUM_THREADS. > > 2. temporary files location: > changeable with "--temporary-directory=DIR" and with environment variable > TMPDIR. > > 3. memory usage: > changeable with "--buffer-size=SIZE" but not with environment variable. > > 4. compression program: > changeable with "--compression-program=PROG" but not with environment > variable. > (but at the moment, I do not address this aspect). > > > With the attached patch, sort will read an environment variable named > "SORT_BUFFER_SIZE", and will treat it as if "--buffer-size" was specified > (but only if "--buffer-size" wasn't used on the command line). > > If this is conceptually acceptable, I'll prepare a proper patch (with NEWS, > help, docs, etc.). > > Regards, > -gordon
Thanks for the detailed rationale, however the existing env variables are significant to more utils than sort(1). I.E. they're generally system level settings, rather than command level. Also sort -S is very portable, even though not standardised. solaris' sort(1) has -S and GNU sort is used on most other platforms, which has -S available since TEXTUTILS-2_0_10-58-gbf86c62 Note also this thread on the selection of a default buffer size for pipes: http://thread.gmane.org/gmane.comp.gnu.coreutils.general/878/focus=887 So currently I'd be 70:30 against adding such a variable. cheers, Pádraig.