Hi I realized that there was a mistake on my part in the initial report.
On 2011-06-09, at 2:46 AM, Paul Eggert wrote: > Thanks for your bug report. If 'sort' is breaking up its input into 4 > MiB chunks, sorting them, creating a separate temp file for each > chunk, and then merging the results with a 16-way merge, then the > first level of 16-way merges will produce 64 MiB files, and the second > level will produce 1 GiB temp files, which is about the size you're > observing. Since your input is about 18 GiB in size (is that right?), > I'd expect to see two third-level merges. The first would be a 16-way > merge, generating about 16 GiB total. The second would be a roughly > 2-way merge, generating about 2 GiB. Then there would be a single > fourth-level merge of these two big files into the final 18 GiB of > output. The input data is not 350 million records, but significantly higher (to the tune of 2 billion). I'd estimate that at ~56 bytes per record, the data set's size was roughly 100Gigs I may have jumped the gun and associated the delay/looping I've observed with the unrelated "hang" bug I've observed earlier with compressed temp files in coreutils 8.5. Given this new information, do you think the behaviour I observed is reasonable ? Or is there still the possibility of a bug worth pursuing ? > How much RAM do you have? Is your host x86 or x86-64 or what? > (That "4 MiB" in my example is an absurdly small number, and > this is a performance bug in 'sort', but that's a different > matter I think.) This occurred on an x86_64 box with 12G RAM, 4 x 2.66Ghz CPUs. The disks are quite slow as it's all on the same local RAID5 volume for both the read-data and the written temp files and eventual output. Thank you.
