Hi,
I'm doing some benchmarks on my cluster including the TeraSort
benchmark to test a couple of hardware characteristics. When I was
playing with Hadoop's generator, I found out that the keys generated
by Hadoop's TeraGen implementation are not the same as the official
generator located here: http://www.ordinal.com/try.cgi/gensort.tar.gz
Here are the first 5 keys generated by Hadoop:
---------------------
.t^#\|v$2\
7...@~?'WdUF
w[o||:N&H,
^Eu)<n#kdP
+l-$$OE/ZH
---------------------
Whereas the keys generated by the official generator are:
ASCII keys: Binary keys:
--------------------- ---------------------
AsfAGHM5om JimGrayRIP
~sHd0jDv6X àäb³íþG
uI^EYm8s=| ESÛíS)6\
Q)JN)R9z-L *Ã6+`v_
o4FoBkqERn \«8®Rb× --> (note: that some binary keys are
negative and so not printable as a char)
--------------------- ---------------------
I was wondering if Hadoop's generator is based on the official
generaor exactly or is this just a similar implementation producing
different results. Can I be a displaying the results incorrectly? Here
is how I display them:
private void printKey(Text key) {
byte[] keyBytes = key.getBytes();
for(int i=0; i<10; i++)
System.out.print((char)keyBytes[i]);
}
Thanks,
Jim