Sorry for the email. Thanks for any help or hint. I am using Hadoop Streaming. The input are multiple files. Is there a way to get the current filename in mapper?
For example:
$HADOOP_HOME/bin/hadoop \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer
In mapper:
while (<STDIN>){
//how to tell the current line is from file1 or file2?
}
