[
https://issues.apache.org/jira/browse/NUTCH-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141568#comment-16141568
]
Sebastian Nagel commented on NUTCH-2413:
----------------------------------------
Hi [~maborec], the property "parse.filter.urls" is not ignored by Fetcher while
parsing documents, see FetcherOutputFormat which uses ParseOutputFormat. Your
patch applies not to parsing but to handling redirects. If filtering redirects
should be configurable, wouldn't it be better to introduce new properties (eg.
"fetcher.filter.redirects" and to be complete "fetcher.normalize.redirects")?
Handling of redirects should be consistent and not depend on whether Fetcher is
parsing or not.
> When fetching and parsing together, parameter "parse.filter.urls" is ignored
> ----------------------------------------------------------------------------
>
> Key: NUTCH-2413
> URL: https://issues.apache.org/jira/browse/NUTCH-2413
> Project: Nutch
> Issue Type: Bug
> Components: fetcher, parser
> Environment: Apache Nutch release 1.13.
> Reporter: Marcos Bori
>
> In a situation when we want to:
> (1) Execute the fetch and parse together ("fetcher.parse" setting to "true")
> (2) Avoid applying the URL filters when executing this phase.
> Condition (2) can be configured when parsing is executed as a separate
> process by setting "parse.filter.urls" to "false".
> However, this setting ("parse.filter.urls") is ignored when we execute the
> fetch and parse phases together.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)