Pádraig Brady wrote: > I was surprised to notice sort was accessing the disk on multiple runs on > a 500MB file on my 2GB RAM laptop. Here was my memory situation: > > $ free -m | head -n2 > total used free shared buffers cached > Mem: 2006 603 1403 0 67 404 > $ cat 500MB_access_log > /dev/null > $ free -m | head -n2 > total used free shared buffers cached > Mem: 2006 1095 911 0 67 895 > > So on subsequent runs I had 911MB free but I noticed sort was only using > around half that. In fact looking at the code it was using: > > buf_size = MIN(rlimit, MAX(free, total/8))/2 > > This seems a bit conservative to me especially as when RAM sizes are > increasing then more will tend to be dedicated to cache, and thus safer > to use. In fact my case is a little unusual as I had just booted. > The usual case is for free to tend to 0 over time as more files are cached. > In other words, the rlimits are more important to stay away from than the > other "limits". So might this be better?
[Oh! just discovered this partially-written reply. Was interrupted and almost never made it back. Sorry about that. ] The default is intended to be conservative, e.g, in case multiple invocations of sort happen to run in parallel, or in a multi-user environment. > buf_size = MIN(rlimit/2, MAX(free, total/8)) > > I also noticed that the code in default_sort_size() assumes the > rlimit values are unsigned which may cause portability issues? > > Note the "used" value seen in the above output from `free` is > not used in the equation at present. > > p.s. while testing this I noticed that sort from git with default CFLAGS > is about 14% faster than sort from coreutils-7.2 that ships with F11. Definitely worth investigating. > Nothing has changed in the sort code as far as I can see, and > also the compiler and glibc were the same. > > $ export LANG=C > time sort -t ' ' -k4.9n -k4.5M -k4.2n -k4.14,4 --buffer-size=1G access_log > > /dev/null > > real 0m28.631s > user 0m26.866s > sys 0m1.354s > > $ time ~/git/coreutils/src/sort -t ' ' -k4.9n -k4.5M -k4.2n -k4.14,4 > --buffer-size=1G access_log > /dev/null > > real 0m24.199s > user 0m22.707s > sys 0m1.370s > > I first suspected compiler flags, however recompiling sort.o > as follows, does not make a difference: > $ rm sort.o && make CFLAGS="$(rpm -q --qf %{OPTFLAGS} coreutils)" V=1 > > So I'm now guessing the i18n patch is affecting the speed even though LANG=C > > p.p.s recompiling all of coreutils with the above rpm flags, fails with > warnings like: > cp.c:358: error: not protecting local variables: variable length buffer > [-Wstack-protector] > due to the ASSIGN_STRDUPA macro. > > > _______________________________________________ > Bug-coreutils mailing list > Bug-coreutils@gnu.org > http://lists.gnu.org/mailman/listinfo/bug-coreutils _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils