[ 
https://issues.apache.org/jira/browse/NUTCH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514874
 ] 

Doğacan Güney commented on NUTCH-524:
-------------------------------------

If you are fetching N urls from a single host, then you should fetch all N urls 
from a single machine, no matter how many machines you have. This is necessary 
for web politeness (your fetcher should at most keep 1 connection open to a 
server at any time).

PS: You patch unnecessarily removes and re-adds the entire file even though it 
is actually just changing a single line. In the future, please do not attach a 
page that touches lines it doesn't change.

> Generate Problem with Single Node
> ---------------------------------
>
>                 Key: NUTCH-524
>                 URL: https://issues.apache.org/jira/browse/NUTCH-524
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Daniel Clark
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: nutch-0.9_PartitionUrlByHost.patch
>
>
> Nutch with Hadoop has problems with a single node in URL list when there is a 
> cluster of two or more machines.  I will provide a fix for this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to