When you figure it out, could you please suggest an optimization for streaming?
Does pipes deserializes and serializes data for the identity mappers or just "passes it through" ? (Streaming converts input to text, afaik) - milind ----- Original Message ----- From: Owen O'Malley <[EMAIL PROTECTED]> To: [email protected] <[email protected]> Sent: Thu Nov 08 17:03:01 2007 Subject: sort speeds under java, c++, and streaming I set up a little benchmark on a 39 node cluster to sort 40gb of random text data (generated by RandomTextWriter using key length: 1-10 words and value length: 0-200 words, data uncompressed). The runtimes in minutes are: Java: 4:22 C++ (Pipes): 3:50 Streaming: 4:44 I was surprised to find that Pipes out performed Java, even with the extra process. I suspect it was because of the buffering between the input and output of Pipes. -- Owen
