[ 
https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941824#comment-16941824
 ] 

Sebastian Nagel commented on NUTCH-1186:
----------------------------------------

Disabling normalization can be done by setting: urlnormalizer.scope.partition = 
org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer, and I think this was 
why the 
[PassURLNormalizer|https://nutch.apache.org/apidocs/apidocs-1.15/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.html]
 has been created for. Needs either to be fixed (see Markus' patch) or good 
documentation in nutch-default.xml. Could also define PassURLNormalizer as the 
default for urlnormalizer.scope.partition.

> FreeGenerator always normalizes
> -------------------------------
>
>                 Key: NUTCH-1186
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1186
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>         Attachments: NUTCH-1186-1.7-1.patch, NUTCH-1186-1.7-2.patch
>
>
> The FreeGenerator does not honor the -normalize option, it always normalizes 
> all URL's in the input directory. The -filter option is respected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to