[ 
https://issues.apache.org/jira/browse/NUTCH-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914936#action_12914936
 ] 

Julien Nioche commented on NUTCH-864:
-------------------------------------

In theory we should not see any elements with a status of 0 after updating but 
having an explicit code would be much cleaner. This should not be the case in 
the code as it is now, but we could have webpages created with the default 
status somewhere else in the code and we would not be able to differentiate it 
from status 0 used by the redirections.

> Fetcher generates entries with status 0
> ---------------------------------------
>
>                 Key: NUTCH-864
>                 URL: https://issues.apache.org/jira/browse/NUTCH-864
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>         Environment: Gora with SQLBackend
> URL: https://svn.apache.org/repos/asf/nutch/branches/nutchbase
> Last Changed Rev: 980748
> Last Changed Date: 2010-07-30 14:19:52 +0200 (Fri, 30 Jul 2010)
>            Reporter: Julien Nioche
>            Assignee: Doğacan Güney
>             Fix For: 2.0
>
>
> After a round of fetching which got the following protocol status :
> 10/07/30 15:11:39 INFO mapred.JobClient:     ACCESS_DENIED=2
> 10/07/30 15:11:39 INFO mapred.JobClient:     SUCCESS=1177
> 10/07/30 15:11:39 INFO mapred.JobClient:     GONE=3
> 10/07/30 15:11:39 INFO mapred.JobClient:     TEMP_MOVED=138
> 10/07/30 15:11:39 INFO mapred.JobClient:     EXCEPTION=93
> 10/07/30 15:11:39 INFO mapred.JobClient:     MOVED=521
> 10/07/30 15:11:39 INFO mapred.JobClient:     NOTFOUND=62
> I ran : ./nutch org.apache.nutch.crawl.WebTableReader -stats
> 10/07/30 15:12:37 INFO crawl.WebTableReader: Statistics for WebTable: 
> 10/07/30 15:12:37 INFO crawl.WebTableReader: TOTAL urls:      2690
> 10/07/30 15:12:37 INFO crawl.WebTableReader: retry 0: 2690
> 10/07/30 15:12:37 INFO crawl.WebTableReader: min score:       0.0
> 10/07/30 15:12:37 INFO crawl.WebTableReader: avg score:       0.7587361
> 10/07/30 15:12:37 INFO crawl.WebTableReader: max score:       1.0
> 10/07/30 15:12:37 INFO crawl.WebTableReader: status 0 (null): 649
> 10/07/30 15:12:37 INFO crawl.WebTableReader: status 2 (status_fetched):       
> 1177 (SUCCESS=1177)
> 10/07/30 15:12:37 INFO crawl.WebTableReader: status 3 (status_gone):  112 
> 10/07/30 15:12:37 INFO crawl.WebTableReader: status 34 (status_retry):        
> 93 (EXCEPTION=93)
> 10/07/30 15:12:37 INFO crawl.WebTableReader: status 4 (status_redir_temp):    
> 138  (TEMP_MOVED=138)
> 10/07/30 15:12:37 INFO crawl.WebTableReader: status 5 (status_redir_perm):    
> 521 (MOVED=521)
> 10/07/30 15:12:37 INFO crawl.WebTableReader: WebTable statistics: done
> There should not be any entries with status 0 (null)
> I will investigate a bit more...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to