[ http://issues.apache.org/jira/browse/NUTCH-382?page=all ]

Jim Kellerman updated NUTCH-382:
--------------------------------

    Attachment: patch.txt

Patch to fix this issue.

> Fix for NUTCH-365 introduced a bug if generate.max.per.host.by.ip is enabled
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-382
>                 URL: http://issues.apache.org/jira/browse/NUTCH-382
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 0.9.0
>            Reporter: Jim Kellerman
>         Attachments: patch.txt
>
>
> The fix for NUTCH-365 in org.apache.nutch.crawl.Generator.java (revision 
> 449088) introduced a bug in which if generate.max.per.host.by.ip is enabled, 
> the error message "WARN  crawl.Generator (Generator.java:reduce(181)) - 
> Malformed URL: '38.99.15.82', skipping". The message varies according to the 
> host IP.
> This is because the hostname has already been converted to its IP address, 
> and the code:
>               host = normalizers.normalize(host, 
> URLNormalizers.SCOPE_GENERATE_HOST_COUNT);
> will not normalize an IP address. What is needed to fix this this problem is 
> to include the code inserted in revision 449088 inside an else block so that 
> this code is not executed if generate.max.per.host.by.ip is enabled.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to