Generalize the SequenceFileInputFilter to apply to any InputFormat
------------------------------------------------------------------
Key: HADOOP-449
URL: http://issues.apache.org/jira/browse/HADOOP-449
Project: Hadoop
Issue Type: Improvement
Components: mapred
Affects Versions: 0.5.0
Reporter: Owen O'Malley
Assigned To: Owen O'Malley
Fix For: 0.6.0
I'd like to generalize the SequenceFileInputFormat that was introduced in
HADOOP-412 so that it can be applied to any InputFormat. To do this, I propose:
interface WritableFilter {
boolean accept(Writable item);
}
class FilterInputFormat implements InputFormat {
...
}
FilterInputFormat would look in the JobConf for:
mapred.input.filter.source = the underlying input format
mapred.input.filter.filters = a list of class names that implement
WritableFilter
The FilterInputFormat will work like the current SequenceFilter, but use an
internal RecordReader rather than the SequenceFile. This will require adding a
next(key) and getCurrentValue(value) to the RecordReader interface, but that
will be addressed in a different issue.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira