On 12/7/08 11:32 PM, "Andy Sautins" <[EMAIL PROTECTED]> wrote:
>
>
> I'm having trouble finding a way to do what I want, so I'm wondering
> if I'm just not looking at the right place or if I'm thinking about the
> problem in the wrong way. Any insight would be appreciated.
>
>
>
> Let's say I have a directory of files that contains a combination of
> different file types. The MapReduce job needs to process all files in
> the directory but generates different key/value pairs depending on the
> file being processed. What I'd like to do is use the filename to
> identify the file type being processed and use that information in the
> map job. What it seems like what I'd want is the map job to have access
> to the filename of the input file split being processed. I haven't been
> able to find out if that is available to a derived class of
> MapReduceBase.
>
>
That's map.input.file available in the map via JobConf. The mapper class has
to override the implementation of configure in MapReduceBase and get the
filename via JobConf.get("map.input.file"). Store that in some field
variable of your mapper class. You can then inspect that in your map method.
>
> Does what I'm trying to do make sense or is there a better way of
> processing a job like the one I'm describing?
>
>
Look at MultipleInputs class (in the mapred.lib directory). That could prove
useful.
>
> Thank you
>
>
>
> Andy
>
>
>
>
>
>
>