cool.  Want to be walked through the "man page" of this.

Forward to me? Or be prepared to demo in the review?

On Aug 1, 2006, at 5:34 PM, Hairong Kuang (JIRA) wrote:

     [ http://issues.apache.org/jira/browse/HADOOP-412?page=all ]

Hairong Kuang updated HADOOP-412:
---------------------------------

    Attachment: filter.patch

This patch provides class SequenceFileInputFilter that can feed a subset of sequence file records to map tasks. It provides a class method setFilter that defines a flltering criteria.

The patch provides three Filters: RegexFilter, PercentFilter, and MD5Filter. But a programmer may define its own filter. Any user- defined filter should either implements interface Filter or extend from FilterBase.

A junit test is also included.

provide an input format that fetches a subset of sequence file records --------------------------------------------------------------------- -

                Key: HADOOP-412
                URL: http://issues.apache.org/jira/browse/HADOOP-412
            Project: Hadoop
         Issue Type: New Feature
         Components: mapred
   Affects Versions: 0.4.0
           Reporter: Hairong Kuang
        Assigned To: Hairong Kuang
            Fix For: 0.4.0

        Attachments: filter.patch


Sometimes a map/red job only wants to work on a subset of input data for the needs of its apllication or at the debugging phase. It would be convenient if an input format transparently handles this. It should provide an API that allows a programmer to specify a filtering criteria.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/ Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/ software/jira



Reply via email to