http://prohadoop.ning.com/forum/topics/passing-whole-file-to-map

On Sat, Jan 23, 2010 at 8:41 AM, Edward Capriolo <[email protected]>wrote:

> My bible code problem is someone similar. I have many small files and
> one mapper needs to process an entire file. So I generate an input
> file
>
> /user/bc/ecapriolo/bible1/grid/10/0,dictionary.txt
> /user/bc/ecapriolo/bible1/grid/10/1,dictionary.txt
> /user/bc/ecapriolo/bible1/grid/10/2,dictionary.txt
>
> use nline input format:
>
>    JobConf conf = new JobConf(getConf(), GridSearcher.class);
>    conf.setJobName("GridSearcher");
>    conf.setMapperClass(MapClass.class);
>    conf.setInputFormat(NLineInputFormat.class);
>    conf.setMapOutputKeyClass(Text.class);
>    conf.setMapOutputValueClass(Text.class);
>    FileInputFormat.setInputPaths(conf, new
> Path("/user/bc/gridsearchcmd.txt"));
>    FileOutputFormat.setOutputPath(conf, new
> Path("/user/bc/gridsearchres"));
>
> Now each mapper opens and processes the entire file using
> FSDataInputStream. It is an anti-pattern, but my map is NOT feeding me
> line per line of data. It is only feeding me the names of files to
> open. One map one file.
>
> On Sat, Jan 23, 2010 at 9:54 AM, Raymond Jennings III
> <[email protected]> wrote:
> > Not sure if this solves your problem but I had a similar case where there
> was unique data at the beginning of the file and if that file was split
> between maps I would lose that for the 2nd and subsequent maps.  I was able
> to pull the file name from the conf and read the first two lines for every
> map.
> >
> > --- On Sat, 1/23/10, stolikp <[email protected]> wrote:
> >
> >> From: stolikp <[email protected]>
> >> Subject: Passing whole text file to a single map
> >> To: [email protected]
> >> Date: Saturday, January 23, 2010, 9:49 AM
> >>
> >> I've got some text files in my input directory and I want
> >> to pass each single
> >> text file (whole file not just a line) to a map (one file
> >> per one map). How
> >> can I do this ? TextInputFormat splits text into lines and
> >> I do not want
> >> this to happen.
> >> I tried:
> >>
> http://hadoop.apache.org/common/docs/r0.20./streaming.html#How+do+I+process+files%2C+one+per+map%3F
> >> but it doesn't work for me, compiler doesn't know what
> >> NonSplitableTextInputFormat.class is.
> >> I'm using hadoop 0.20.1
> >> --
> >> View this message in context:
> http://old.nabble.com/Passing-whole-text-file-to-a-single-map-tp27286204p27286204.html
> >> Sent from the Hadoop core-user mailing list archive at
> >> Nabble.com.
> >>
> >>
> >
> >
> >
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Reply via email to