On 2016-04-03, at 19:25, David Betten wrote: > First of all, full disclaimer that I was in DFSORT development for about 8 > years so I might be biased. But I just want to share a few thoughts. > > First the idea of loading all the data into a large hashmap to do the sort > tends to eliminate one very important thing and that's overlap. > Essentially, you read the entire input, conduct your massive hashsort, and > then write the output with no overlap of those three phases. ... > Strawman. Or red herring. Or some metaphor.
You seem to have deliberately made an adverse choice so you can refute it. Rather than hash, use a B-tree so sorting fully overlaps input. One might argue that given sufficient page data space any sort could be performed in virtual storage. I suspect performance would be suboptimal. I suspect that for a large enough data set Cooley-Tookey FFT brutally defies LoR. But some of the operations in C-T are hauntingly similar to a balanced merge. Might sorting techniques with workfiles implement a C-T that outperforms a virtual storage implementation? -- gil ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
