[ http://issues.apache.org/jira/browse/HADOOP-412?page=all ]
Hairong Kuang updated HADOOP-412:
---------------------------------
Attachment: filter.patch
This patch provides class SequenceFileInputFilter that can feed a subset of
sequence file records to map tasks. It provides a class method setFilter that
defines a flltering criteria.
The patch provides three Filters: RegexFilter, PercentFilter, and MD5Filter.
But a programmer may define its own filter. Any user-defined filter should
either implements interface Filter or extend from FilterBase.
A junit test is also included.
> provide an input format that fetches a subset of sequence file records
> ----------------------------------------------------------------------
>
> Key: HADOOP-412
> URL: http://issues.apache.org/jira/browse/HADOOP-412
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Affects Versions: 0.4.0
> Reporter: Hairong Kuang
> Assigned To: Hairong Kuang
> Fix For: 0.4.0
>
> Attachments: filter.patch
>
>
> Sometimes a map/red job only wants to work on a subset of input data for the
> needs of its apllication or at the debugging phase. It would be convenient if
> an input format transparently handles this. It should provide an API that
> allows a programmer to specify a filtering criteria.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira