Re: Passing whole text file to a single map

Alex Kozlov Sat, 23 Jan 2010 16:30:55 -0800

By the design, the TextInputFormat will split the file into lines and pass
each one as a record.


If you override isSplittable(), it will still return a bunch of records.
 Each file will be a split.

If you want to get the context of a single file, the best way is to put the
files into a SequenceFile, one per key, which can be the file name, and read
the file as bytes.

Alternatively, you can pass a file where each line is a file name to a
mapper and open the file explicitly within the mapper.

On Sat, Jan 23, 2010 at 8:48 AM, prashant ullegaddi <
[email protected]> wrote:

> Why don't you extend FileInputFormat, and implement
> isSplittable<
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/FileInputFormat.html#isSplitable%28org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path%29
> >,
> so that it returns false.
>
>
> On Sat, Jan 23, 2010 at 10:05 PM, stolikp <[email protected]> wrote:
>
> >
> > I've got some text files in my input directory and I want to pass each
> > single
> > text file (whole file not just a line) to a map (one file per one map).
> How
> > can I do this ? TextInputFormat splits text into lines and I do not want
> > this to happen.
> > I tried:
> >
> >
> http://hadoop.apache.org/common/docs/r0.20./streaming.html#How+do+I+process+files%2C+one+per+map%3F
> > but it doesn't work for me, compiler doesn't know what
> > NonSplitableTextInputFormat.class is.
> > I'm using hadoop 0.20.1
> > --
> > View this message in context:
> >
> http://old.nabble.com/Passing-whole-text-file-to-a-single-map-tp27287649p27287649.html
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
> >
>
>
> --
> Thanks,
> Prashant Ullegaddi,
> Search and Information Extraction Lab,
> IIIT-Hyderabad, INDIA.
>

Re: Passing whole text file to a single map

Reply via email to