http://prohadoop.ning.com/forum/topics/passing-whole-file-to-map
On Sat, Jan 23, 2010 at 8:41 AM, Edward Capriolo <[email protected]>wrote: > My bible code problem is someone similar. I have many small files and > one mapper needs to process an entire file. So I generate an input > file > > /user/bc/ecapriolo/bible1/grid/10/0,dictionary.txt > /user/bc/ecapriolo/bible1/grid/10/1,dictionary.txt > /user/bc/ecapriolo/bible1/grid/10/2,dictionary.txt > > use nline input format: > > JobConf conf = new JobConf(getConf(), GridSearcher.class); > conf.setJobName("GridSearcher"); > conf.setMapperClass(MapClass.class); > conf.setInputFormat(NLineInputFormat.class); > conf.setMapOutputKeyClass(Text.class); > conf.setMapOutputValueClass(Text.class); > FileInputFormat.setInputPaths(conf, new > Path("/user/bc/gridsearchcmd.txt")); > FileOutputFormat.setOutputPath(conf, new > Path("/user/bc/gridsearchres")); > > Now each mapper opens and processes the entire file using > FSDataInputStream. It is an anti-pattern, but my map is NOT feeding me > line per line of data. It is only feeding me the names of files to > open. One map one file. > > On Sat, Jan 23, 2010 at 9:54 AM, Raymond Jennings III > <[email protected]> wrote: > > Not sure if this solves your problem but I had a similar case where there > was unique data at the beginning of the file and if that file was split > between maps I would lose that for the 2nd and subsequent maps. I was able > to pull the file name from the conf and read the first two lines for every > map. > > > > --- On Sat, 1/23/10, stolikp <[email protected]> wrote: > > > >> From: stolikp <[email protected]> > >> Subject: Passing whole text file to a single map > >> To: [email protected] > >> Date: Saturday, January 23, 2010, 9:49 AM > >> > >> I've got some text files in my input directory and I want > >> to pass each single > >> text file (whole file not just a line) to a map (one file > >> per one map). How > >> can I do this ? TextInputFormat splits text into lines and > >> I do not want > >> this to happen. > >> I tried: > >> > http://hadoop.apache.org/common/docs/r0.20./streaming.html#How+do+I+process+files%2C+one+per+map%3F > >> but it doesn't work for me, compiler doesn't know what > >> NonSplitableTextInputFormat.class is. > >> I'm using hadoop 0.20.1 > >> -- > >> View this message in context: > http://old.nabble.com/Passing-whole-text-file-to-a-single-map-tp27286204p27286204.html > >> Sent from the Hadoop core-user mailing list archive at > >> Nabble.com. > >> > >> > > > > > > > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
