Use FileInputFormat.
You mapper will look something like this
public class MyMapper extends Mapper<....>{
int sum=0;
@Override
public void map(LongWritable key, Text values, Context context){
sum = sum+Integer.parseInt(values.toString());
}
@Override
public void cleanup(Mapper.Context context) throws IOException,
InterruptedException {
context.write("sum",new Text(sum+""));
}
}
Your reducer will look something like
public class MyReducer extends Reducer<Text, Text, Text, NullWritable>{
private NullWritable outputValue = NullWritable.get();
public void reduce(Text key, Iterable<Text> values, Context context){
int sum = 0;
for (Text value : values) {
sum = sum + Integer.parseInt(value.toString());
}
context.write(new Text(sum+""), outputValue);
}
}
madhu phatak wrote:
>
> Hi
> I have a very large file of size 1.4 GB. Each line of the file is a number
> .
> I want to find the sum all those numbers.
> I wanted to use NLineInputFormat as a InputFormat but it sends only one
> line
> to the Mapper which is very in efficient.
> So can you guide me to write a InputFormat which splits the file
> into multiple Splits and each mapper can read multiple
> line from each split
>
> Regards
> Madhukar
>
>
--
View this message in context:
http://lucene.472066.n3.nabble.com/InputFormat-for-a-big-file-tp2105461p2107514.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.