Hi Owen, Can you provide more details of your test? In particular what was the Java Map-reduce program that your ran? Was it src/examples/org/apache/hadoop/examples/Sort.java ? Also, I can't find anything called "RandomTextWriter" in the source tarball, can you point me to it? Thanks.
- Doug On Nov 8, 2007 5:03 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > I set up a little benchmark on a 39 node cluster to sort 40gb of > random text data (generated by RandomTextWriter using key length: > 1-10 words and value length: 0-200 words, data uncompressed). The > runtimes in minutes are: > > Java: 4:22 > C++ (Pipes): 3:50 > Streaming: 4:44 > > I was surprised to find that Pipes out performed Java, even with the > extra process. I suspect it was because of the buffering between the > input and output of Pipes. > > -- Owen >
