I use org.apache.hadoop.streaming.AutoInputFormat to handle sequence file input for streaming, but I found that it provide format below for <key, value>. ( key is a string , value is binary)
"keystring\tvalue\n" since value is binary, there is a lot '\n' within value, my mapper can't distinguish it. in other words, I need value presented as length + raw bytes or typed bytes I called streaming as below: $HADOOP_HOME/bin/hadoop jar \ $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.0.2.jar \ -input data.seq \ -output output \ -mapper mapper \ -reducer reducer \ -inputformat org.apache.hadoop.streaming.AutoInputFormat \ -file mapper \ -file reducer huangs thuhuang...@gmail.com