cool. Want to be walked through the "man page" of this.
Forward to me? Or be prepared to demo in the review?
On Aug 1, 2006, at 5:34 PM, Hairong Kuang (JIRA) wrote:
[ http://issues.apache.org/jira/browse/HADOOP-412?page=all ]
Hairong Kuang updated HADOOP-412:
---------------------------------
Attachment: filter.patch
This patch provides class SequenceFileInputFilter that can feed a
subset of sequence file records to map tasks. It provides a class
method setFilter that defines a flltering criteria.
The patch provides three Filters: RegexFilter, PercentFilter, and
MD5Filter. But a programmer may define its own filter. Any user-
defined filter should either implements interface Filter or extend
from FilterBase.
A junit test is also included.
provide an input format that fetches a subset of sequence file
records
---------------------------------------------------------------------
-
Key: HADOOP-412
URL: http://issues.apache.org/jira/browse/HADOOP-412
Project: Hadoop
Issue Type: New Feature
Components: mapred
Affects Versions: 0.4.0
Reporter: Hairong Kuang
Assigned To: Hairong Kuang
Fix For: 0.4.0
Attachments: filter.patch
Sometimes a map/red job only wants to work on a subset of input
data for the needs of its apllication or at the debugging phase.
It would be convenient if an input format transparently handles
this. It should provide an API that allows a programmer to specify
a filtering criteria.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: http://issues.apache.org/jira/secure/
Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/
software/jira