On 22/11/10 17:28, Pádraig Brady wrote: > On 18/11/10 16:36, Jim Hester wrote: >> A common problem when sorting files stems from the file containing 1 >> or more header lines, which should not be sorted. As of now, the >> common solution to this problem is to remove the header lines with >> manually, or to output only the non header lines with tail, awk, or >> some other program and pipe the results to sort. > > Thanks for the patch! > >> This was likely not >> deemed a problem when sort was only single threaded, as the printing >> and pipe was likely still faster than the sort itself. However with >> multi-threaded sort this results in the operation bottle necking >> waiting for more information from the pipe. > > I'm not following the argument above. > One can always print the header synchronously? > I.E. the `head` below is guaranteed to run before the `sort` > > printf "z_header\nb\na\n" > file > (head -n1 file; sort <(tail -n+2 file) <(tail -n+2 file)) > > Now the above is awkward and dependent on bash > (constructs per file), so your idea has some merit I think.
Note the --header option is especially useful for `join` as it transforms its input, however sort does not and so might be amenable to a more general solution. Perhaps something like: (head --no-header -n1 file.* | head -n1; tail --no-header -n+2 file.* | sort) I.E. add the --no-header option to suppress the ==> file name <== annotations which would allow using `head` and `tail` in general for this. thanks, Pádraig.
