Re: InputFormat for a big file

Harsh J Mon, 20 Dec 2010 03:00:58 -0800

Use TextInputFormat for Text files.

On Mon, Dec 20, 2010 at 2:29 PM, madhu phatak <[email protected]> wrote:
> If I use FileInputFormat it gives instantiation error since FileInputFormat
> is abstract class.
>
> On Sat, Dec 18, 2010 at 3:21 AM, Aman <[email protected]> wrote:
>
>>
>> Use FileInputFormat
>>
>>
>> You mapper will look something like this
>>
>> public class MyMapper extends Mapper<....>{
>> int sum=0;
>>
>> @Override
>> public void map(LongWritable key, Text values, Context context){
>>       sum = sum+Integer.parseInt(values.toString());
>>   }
>>
>> @Override
>>    public void cleanup(Mapper.Context context) throws IOException,
>> InterruptedException {
>>        context.write("sum",new Text(sum+""));
>>    }
>> }
>>
>> Your reducer will look something like
>> public class MyReducer extends Reducer<Text, Text, Text, NullWritable>{
>>  private NullWritable outputValue = NullWritable.get();
>>
>>
>> public void reduce(Text key, Iterable<Text> values, Context context){
>>  int sum = 0;
>>            for (Text value : values) {
>>                sum = sum + Integer.parseInt(value.toString());
>>            }
>>  context.write(new Text(sum+""), outputValue);
>>
>> }
>>
>>
>> }
>>
>>
>> madhu phatak wrote:
>> >
>> > Hi
>> > I have a very large file of size 1.4 GB. Each line of the file is a
>> number
>> > .
>> > I want to find the sum all those numbers.
>> > I wanted to use NLineInputFormat as a InputFormat but it sends only one
>> > line
>> > to the Mapper which is very in efficient.
>> > So can you guide me to write a InputFormat which splits the file
>> > into multiple Splits and each mapper can read multiple
>> > line from each split
>> >
>> > Regards
>> > Madhukar
>> >
>> >
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/InputFormat-for-a-big-file-tp2105461p2107514.html
>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>>
>




-- 
Harsh J
www.harshj.com

Re: InputFormat for a big file

Reply via email to