[
https://issues.apache.org/jira/browse/NUTCH-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Luis Betancourt Gonzalez updated NUTCH-1985:
--------------------------------------------------
Attachment: NUTCH-1985.patch
> Adding a main() method to the MimeTypeIndexingFilter
> ----------------------------------------------------
>
> Key: NUTCH-1985
> URL: https://issues.apache.org/jira/browse/NUTCH-1985
> Project: Nutch
> Issue Type: Improvement
> Components: indexer, metadata, plugin
> Affects Versions: 1.10
> Reporter: Jorge Luis Betancourt Gonzalez
> Priority: Minor
> Labels: features, patch, test
> Fix For: 1.10
>
> Attachments: NUTCH-1985.patch
>
>
> This make very easy the testing of different rules files to check the
> expressions used to filter the content based on the MIME type detected. Until
> now the only way to check this was to do test crawls and check the stored
> data in Solr/Elasticsearch.
> This allows calling the file using the {{bin/nutch plugin}} command,
> something like:
> {{bin/nutch plugin mimetype-filter
> org.apache.nutch.indexer.filter.MimeTypeIndexingFilter -h}}
> Two options are accepted, {{-h, --help}} for showing the help and {{-rules}}
> for specifying a rules file to be used, this makes easy to play with
> different rules file until you get the desired behavior.
> After invoking the class, a valid MIME type must be entered for each line,
> and the output will be the same MIME type with a {{+}} or {{-}} sign in the
> beginning, indicating if the given MIME type is allowed or denied
> respectively.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)