Paul Eggert wrote : > "Philippe De Muyter" <[EMAIL PROTECTED]> writes: > > > Do you think that this new option would fit easily in the current `sort' > > implementation, and how ? > > It could be done and would I think be useful, though there would be a > few issues. > > 1. You need to handle the case of "ties" correctly. A tie occurs if > two lines compare equal: this can happen even if their byte-for-byte > contents differ. In this case, you want the output of "sort --tail > 1000" to be equivalent to "sort | tail -n 1000". As you're sorting if > a new "tie" comes in for the first line in the tail, you need to > discard that first line and insert the "tie" in the correct position > somewhere in the "tail". Ties for "sort --head" are easier, as you > can simply discard new ties for the last line in the head.
What's the current output order for "ties" ? Input order ? or something else ? Does the 'compare' function take care of that ? > > 2. Have you looked at the literature for the best data structure to > solve your problem? For example, the people doing simulations have > worked hard on algorithms to implement priority queues (where the most > important operations are "delete min" and "insert"), and this seems to > be close to what you need. Splay heaps seem to be one of the current > champions in that area. I had already implemented my ad-hoc test program using a binary heap. I will look at splay heaps it you think that could be better. > > 3 (optional). There should be no arbitrary limits, so ideally the > performance should be good even if N is quite large, so large that the > number of saved lines doesn't fit into memory. However, this might > require adding major structural complexity to your algorithm so > perhaps it isn't worth doing. If there's a portable way to detect we overflow a reasonable memory usage, then we could revert to the normal algorithm and output only the first/last N lines. Handling the '-u' option should also happen that way. > > Anyway, thanks for looking into the problem. > Thanks for your interest Philippe _______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-coreutils
