Doesn't the sorting and merging all still happen in Java-land? -----Original Message----- From: Owen O'Malley [mailto:[EMAIL PROTECTED] Sent: Thursday, November 08, 2007 5:03 PM To: [email protected] Subject: sort speeds under java, c++, and streaming
I set up a little benchmark on a 39 node cluster to sort 40gb of random text data (generated by RandomTextWriter using key length: 1-10 words and value length: 0-200 words, data uncompressed). The runtimes in minutes are: Java: 4:22 C++ (Pipes): 3:50 Streaming: 4:44 I was surprised to find that Pipes out performed Java, even with the extra process. I suspect it was because of the buffering between the input and output of Pipes. -- Owen
