On 18/11/10 16:36, Jim Hester wrote: > A common problem when sorting files stems from the file containing 1 > or more header lines, which should not be sorted. As of now, the > common solution to this problem is to remove the header lines with > manually, or to output only the non header lines with tail, awk, or > some other program and pipe the results to sort.
Thanks for the patch! > This was likely not > deemed a problem when sort was only single threaded, as the printing > and pipe was likely still faster than the sort itself. However with > multi-threaded sort this results in the operation bottle necking > waiting for more information from the pipe. I'm not following the argument above. One can always print the header synchronously? I.E. the `head` below is guaranteed to run before the `sort` printf "z_header\nb\na\n" > file (head -n1 file; sort <(tail -n+2 file) <(tail -n+2 file)) Now the above is awkward and dependent on bash (constructs per file), so your idea has some merit I think. > This common operation > would be greatly improved if sort could simply print a user defined > number of lines for each file. I have made a simple patch to > implement this feature, which I have attached to this email. Note `join` recently got the --header option http://lists.gnu.org/archive/html/bug-coreutils/2010-01/msg00284.html also essentially to exclude starting lines from order comparisons. cheers, Pádraig.
