random_shuffle looks pretty hairy.

random_shuffle, as usually implemented, is based on swaps (establishing a random permutation) -- but one can also think of it as sorting with a "broken" comparison function which returns random results, so there may be a clever alternative to doing a real sort with an appended random key?

There's a non-clever way to make any sort stable, which is to add a
secondary sort key that is the original position in the list.  There
may be a clever way to do a stable mergesort without knowing the list
lengths in advance, but I don't know it.

The idea of radix-sort is that one can reduce the problem of sorting on a wide key to several sorts on narrow keys -- adding a secondary sort key is a bit self-defeating for this use.

The brute-force way to preserve stability is to reverse the reversed list in between passes, as used here with integers providing lists (and a pair of integers providing a tape):
http://en.literateprograms.org/Merge_sort_(dc)

The clever way (as used with actual tape drives) was to reverse the sense of comparison on each pass.

That leaves only the heap and permutation operations.  I really don't
know about them.

I'd guess heaps and permutations are essentially swap-based, and that swaps imply random access. (permutations are closely related to insertion/selection-sort, which are not so far away from heap sort)

Knuth Vol 3, "Sorting and Searching", might be a good reference. Anything that was good for external sorting with tapes ought to be good with lists. (indeed, being good with tapes implies that one should be able to find more cache-friendly implementations than the standard singly-linked-list)

A remarkably large share of the uses of array indexing by numerical
variables I found in both my Python code and my JavaScript code were
for some variant of the string.join problem --- given an array of N
strings, connect them with N-1 commas, except when N=0, in which case
we should connect them with N=0 commas instead of N-1 = -1 commas.

Squint at it properly, and tail-call optimization becomes an instance of string.join: instead of "a,b,c," (where x is a jump and ',' the equivalent return) we want to generate "a,b,c". The dual (context preservation when building argument lists) follows the same pattern: we only need to save/restore contexts if there are at least two overlapping evaluations.

This may be a practical reason for the last couple of decades' interest in monads: we prefer to program in an algebraic, tree-like form (code as well as data), while the iron prefers to handle linear sequences of code operating on contiguous chunks of data (op x state -> state') -- and monads provide machinery to go from tree inputs to flat results in a theoretically nice manner.

-Dave

Reply via email to