When you figure it out, could you please suggest an optimization for streaming?

Does pipes deserializes and serializes data for the identity mappers or just 
"passes it through" ? (Streaming converts input to text, afaik)

- milind


----- Original Message -----
From: Owen O'Malley <[EMAIL PROTECTED]>
To: [email protected] <[email protected]>
Sent: Thu Nov 08 17:03:01 2007
Subject: sort speeds under java, c++, and streaming

I set up a little benchmark on a 39 node cluster to sort 40gb of  
random text data (generated by RandomTextWriter using key length:  
1-10 words and value length: 0-200 words, data uncompressed). The  
runtimes in minutes are:

Java:                   4:22
C++ (Pipes):            3:50
Streaming:              4:44

I was surprised to find that Pipes out performed Java, even with the  
extra process. I suspect it was because of the buffering between the  
input and output of Pipes.

-- Owen

Reply via email to