[ 
https://issues.apache.org/jira/browse/NUTCH-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141568#comment-16141568
 ] 

Sebastian Nagel commented on NUTCH-2413:
----------------------------------------

Hi [~maborec], the property "parse.filter.urls" is not ignored by Fetcher while 
parsing documents, see FetcherOutputFormat which uses ParseOutputFormat. Your 
patch applies not to parsing but to handling redirects. If filtering redirects 
should be configurable, wouldn't it be better to introduce new properties (eg. 
"fetcher.filter.redirects" and to be complete "fetcher.normalize.redirects")? 
Handling of redirects should be consistent and not depend on whether Fetcher is 
parsing or not.

> When fetching and parsing together, parameter "parse.filter.urls" is ignored
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-2413
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2413
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, parser
>         Environment: Apache Nutch release 1.13.
>            Reporter: Marcos Bori
>
> In a situation when we want to:
> (1) Execute the fetch and parse together ("fetcher.parse" setting to "true")
> (2) Avoid applying the URL filters when executing this phase.
> Condition (2) can be configured when parsing is executed as a separate 
> process by setting "parse.filter.urls" to "false".
> However, this setting ("parse.filter.urls") is ignored when we execute the 
> fetch and parse phases together. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to