I'm considering writing a patch for sort.c to add a new feature, related to
a stackoverflow inquiry I wrote (
http://stackoverflow.com/questions/14882897/what-standard-commands-can-i-use-to-print-just-the-first-few-lines-of-sorted-out
).

This would be my first patch, and this is my first time messaging a gnu
list; apologies if I'm "doing it wrong."

I use GNU sort a lot, and routinely find myself in the situation of
executing, e.g.:

$ sort ... | head -n 1000

This can be very unnecessarily slow when the input is huge, because sort
does a lot of work that head throws away.

I propose a new parameter, "-H, --head=NLINES", which has sort only print
at most NLINES of output.  More than just a filter at the end like | head,
it would avoid unnecessary sorting on more than NLINES of output.

I want to know the procedure for submitting a patch, and
the likelihood that such a patch would even be considered, before I spend
time to parse the whole sort.c file and propose a complete and rigorous
solution (which would be analogous to submitting the patch).  From a quick
glance at the source, my current strategy would be to alter the merge nodes
when this parameter is set so that the number of lines listed per node is
clamped to NLINES.  While less efficient than an ideal solution, it would
be more efficient than what's currently in place, and has the benefits of
minimal code edits and negligible negative performance impact on mainstream
use when the parameter is not passed.

All feedback welcome, thank you.

-James

Reply via email to