Not sure if this solves your problem but I had a similar case where there was unique data at the beginning of the file and if that file was split between maps I would lose that for the 2nd and subsequent maps. I was able to pull the file name from the conf and read the first two lines for every map.
--- On Sat, 1/23/10, stolikp <[email protected]> wrote: > From: stolikp <[email protected]> > Subject: Passing whole text file to a single map > To: [email protected] > Date: Saturday, January 23, 2010, 9:49 AM > > I've got some text files in my input directory and I want > to pass each single > text file (whole file not just a line) to a map (one file > per one map). How > can I do this ? TextInputFormat splits text into lines and > I do not want > this to happen. > I tried: > http://hadoop.apache.org/common/docs/r0.20./streaming.html#How+do+I+process+files%2C+one+per+map%3F > but it doesn't work for me, compiler doesn't know what > NonSplitableTextInputFormat.class is. > I'm using hadoop 0.20.1 > -- > View this message in context: > http://old.nabble.com/Passing-whole-text-file-to-a-single-map-tp27286204p27286204.html > Sent from the Hadoop core-user mailing list archive at > Nabble.com. > >
