Generalize the SequenceFileInputFilter to apply to any InputFormat
------------------------------------------------------------------

                 Key: HADOOP-449
                 URL: http://issues.apache.org/jira/browse/HADOOP-449
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.5.0
            Reporter: Owen O'Malley
         Assigned To: Owen O'Malley
             Fix For: 0.6.0


I'd like to generalize the SequenceFileInputFormat that was introduced in 
HADOOP-412 so that it can be applied to any InputFormat. To do this, I propose:

interface WritableFilter {
   boolean accept(Writable item);
}

class FilterInputFormat implements InputFormat {
  ...
}

FilterInputFormat would look in the JobConf for:
   mapred.input.filter.source = the underlying input format
   mapred.input.filter.filters = a list of class names that implement 
WritableFilter

The FilterInputFormat will work like the current SequenceFilter, but use an 
internal RecordReader rather than the SequenceFile. This will require adding a 
next(key) and getCurrentValue(value) to the RecordReader interface, but that 
will be addressed in a different issue.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to