Thanks, Amogh. But my case is slightly different. The command line inputs are 2
files: file1 and file2. I need to tell in the mapper which line is from which
file:
#In mapper
while (<STDIN>){
//how to tell the current line is from file1 or file2?
}
-jobconfs map.input.file param does not help in this case
because file1 and file2 are both input.
-Steve
--- On Thu, 10/23/08, Amogh Vasekar <[EMAIL PROTECTED]> wrote:
From: Amogh Vasekar <[EMAIL PROTECTED]>
Subject: RE: Is there a way to know the input filename at Hadoop Streaming?
To: [EMAIL PROTECTED]
Date: Thursday, October 23, 2008, 12:11 AM
Personally haven't worked with streaming but I guess the ur jobconfs
map.input.file param should do it for you.
-----Original Message-----
From: Steve Gao [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 23, 2008 7:26 AM
To: [EMAIL PROTECTED]
Cc: [email protected]
Subject: Is there a way to know the input filename at Hadoop Streaming?
I am using Hadoop Streaming. The input are multiple files.
Is there a way to get the current filename in mapper?
For example:
$HADOOP_HOME/bin/hadoop \
jar $HADOOP_HOME/hadoop-streaming.jar \
-input file1 \
-input file2 \
-output myOutputDir \
-mapper mapper \
-reducer reducer
In mapper:
while (<STDIN>){
//how to tell the current line is from file1 or file2?
}