[ 
https://issues.apache.org/jira/browse/NUTCH-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142017#comment-16142017
 ] 

Marcos Bori commented on NUTCH-2413:
------------------------------------

I have updated the changes with the following:
- Fixed styling errors (spaces, tabs...) as per your comments - Thanx
- I now read the configuration () in the constructor, as per your comments, 
again :)
- I have realized that the normalizers used in the Parsing phase can be 
different from those in the Fetch phase (URLNormalizers.SCOPE_FETCHER and 
URLNormalizers.SCOPE_OUTLINKS, respectiely). I now set up the normalizer for 
outlinks accordingly.

> When fetching and parsing together, parameter "parse.filter.urls" is ignored
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-2413
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2413
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, parser
>    Affects Versions: 1.13
>         Environment: Apache Nutch release 1.13.
>            Reporter: Marcos Bori
>             Fix For: 1.14
>
>
> In a situation when we want to:
> (1) Execute the fetch and parse together ("fetcher.parse" setting to "true")
> (2) Avoid applying the URL filters when executing this phase.
> Condition (2) can be configured when parsing is executed as a separate 
> process by setting "parse.filter.urls" to "false".
> However, this setting ("parse.filter.urls") is ignored when we execute the 
> fetch and parse phases together. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to