How to write a custom input format and record reader to read multiple lines of
text from files
----------------------------------------------------------------------------------------------
Key: MAPREDUCE-1255
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1255
Project: Hadoop Map/Reduce
Issue Type: Task
Affects Versions: 0.20.1
Environment: Ubuntu, 32 bit system. Apache hadoop 0.20.1
Reporter: Kunal Gupta
Priority: Minor
Can someone explain how to override the "FileInputFormat" and "RecordReader" in
order to be able to read multiple lines of text from input files in a single
map task?
Here the key will be the offset of the first line of text and value will be the
N lines of text.
I have overridden the class FileInputFormat:
public class MultiLineFileInputFormat
extends FileInputFormat<LongWritable, Text>{
...
}
and implemented the abstract method:
public RecordReader createRecordReader(InputSplit split,
TaskAttemptContext context)
throws IOException, InterruptedException {...}
I have also overridden the recordreader class:
public class MultiLineFileRecordReader extends RecordReader<LongWritable, Text>
{...}
and in the job configuration, specified this new InputFormat class:
job.setInputFormatClass(MultiLineFileInputFormat.class);
When I run this new map/reduce program, i get the following java error:
Exception in thread "main" java.lang.RuntimeException:
java.lang.NoSuchMethodException:
CustomRecordReader$MultiLineFileInputFormat.<init>()
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at CustomRecordReader.main(CustomRecordReader.java:257)
Caused by: java.lang.NoSuchMethodException:
CustomRecordReader$MultiLineFileInputFormat.<init>()
at java.lang.Class.getConstructor0(Class.java:2706)
at java.lang.Class.getDeclaredConstructor(Class.java:1985)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
... 5 more
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.