[ 
https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511330
 ] 

Hudson commented on NUTCH-503:
------------------------------

Integrated in Nutch-Nightly #145 (See 
[http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/145/])

> Generator exits incorrectly for small fetchlists 
> -------------------------------------------------
>
>                 Key: NUTCH-503
>                 URL: https://issues.apache.org/jira/browse/NUTCH-503
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 0.8, 0.8.1, 0.9.0
>         Environment: Fedora Core 2, JDK 1.6
>            Reporter: Vishal Shah
>            Assignee: Doğacan Güney
>             Fix For: 1.0.0
>
>         Attachments: emptyfetchlist.patch, emptyfetchlist.patch
>
>
>    I think I found the reason why the generator returns with an empty 
> fetchlist for small fetchsizes. 
>  
>    After the first job finishes running, the generator checks the following 
> condition to see if it got an empty list:
>  
>     if (readers == null || readers.length == 0 || !readers[0].next(new
> FloatWritable())) {
>  
>   The third condition is incorrect here. In some cases, esp. for small 
> fetchlists, the first partition might be empty, but some other partition(s) 
> might contain urls. In this case, the Generator is incorrectly assuming that 
> all partitions are empty by just looking at the first. This problem could 
> also occur when all URLs in the fetchlist are from the same host (or from a 
> very small number of hosts, or from a number of hosts that all map to a small 
> number of partitions).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to