I guess one trick you can do without the help of hadoop is to encode the file 
identifier inside the file itself. For example, each line of file1 could start 
with 1'space''content of the original line'.



----- Original Message ----
From: Steve Gao <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Cc: [EMAIL PROTECTED]
Sent: Thursday, October 23, 2008 1:48:11 PM
Subject: [Help needed] Is there a way to know the input filename at Hadoop 
Streaming?

Sorry for the email. Thanks for any help or hint.

    I am using Hadoop Streaming. The input are multiple files.
    Is there a way to get the current filename in mapper?

    For example:
    $HADOOP_HOME/bin/hadoop  \
    jar $HADOOP_HOME/hadoop-streaming.jar \
        -input file1 \
        -input file2 \
        -output myOutputDir \
        -mapper mapper \
        -reducer reducer

    In mapper:
    while (<STDIN>){
      //how to tell the current line is from file1 or file2?
    }


      

Reply via email to