Use TextInputFormat for Text files. On Mon, Dec 20, 2010 at 2:29 PM, madhu phatak <[email protected]> wrote: > If I use FileInputFormat it gives instantiation error since FileInputFormat > is abstract class. > > On Sat, Dec 18, 2010 at 3:21 AM, Aman <[email protected]> wrote: > >> >> Use FileInputFormat >> >> >> You mapper will look something like this >> >> public class MyMapper extends Mapper<....>{ >> int sum=0; >> >> @Override >> public void map(LongWritable key, Text values, Context context){ >> sum = sum+Integer.parseInt(values.toString()); >> } >> >> @Override >> public void cleanup(Mapper.Context context) throws IOException, >> InterruptedException { >> context.write("sum",new Text(sum+"")); >> } >> } >> >> Your reducer will look something like >> public class MyReducer extends Reducer<Text, Text, Text, NullWritable>{ >> private NullWritable outputValue = NullWritable.get(); >> >> >> public void reduce(Text key, Iterable<Text> values, Context context){ >> int sum = 0; >> for (Text value : values) { >> sum = sum + Integer.parseInt(value.toString()); >> } >> context.write(new Text(sum+""), outputValue); >> >> } >> >> >> } >> >> >> madhu phatak wrote: >> > >> > Hi >> > I have a very large file of size 1.4 GB. Each line of the file is a >> number >> > . >> > I want to find the sum all those numbers. >> > I wanted to use NLineInputFormat as a InputFormat but it sends only one >> > line >> > to the Mapper which is very in efficient. >> > So can you guide me to write a InputFormat which splits the file >> > into multiple Splits and each mapper can read multiple >> > line from each split >> > >> > Regards >> > Madhukar >> > >> > >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/InputFormat-for-a-big-file-tp2105461p2107514.html >> Sent from the Hadoop lucene-users mailing list archive at Nabble.com. >> >
-- Harsh J www.harshj.com
