[
https://issues.apache.org/jira/browse/NUTCH-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1044.
----------------------------------
Resolution: Fixed
Committed revision 1156342.
Thanks for reporting it
> Redirected URLs and possibly all of their outlinked URLs have invalid scores.
> -----------------------------------------------------------------------------
>
> Key: NUTCH-1044
> URL: https://issues.apache.org/jira/browse/NUTCH-1044
> Project: Nutch
> Issue Type: Bug
> Components: fetcher, parser
> Affects Versions: 1.3
> Reporter: Nutch User - 1
> Assignee: Julien Nioche
> Priority: Critical
> Fix For: 1.4
>
> Attachments: NUTCH-1044-1.4.patch
>
>
> 1.:
> http://lucene.472066.n3.nabble.com/URL-redirection-and-zero-scores-td3085311.html
> 2.:
> http://lucene.472066.n3.nabble.com/A-possible-solution-to-my-URL-redirection-and-zero-scores-problem-td3162164.html
> Please note that also URLs redirected by meta refresh redirection do have
> invalid scores. For such URLs a CrawlDatum is created on the lines 157-177 of
> ParseOutputFormat.java
> (http://svn.apache.org/viewvc/nutch/branches/branch-1.3/src/java/org/apache/nutch/parse/ParseOutputFormat.java?view=markup).
> The new CrawlDatum's score isn't set anywhere after the creation so it's
> 1.0f as can be seen on the line 122 of CrawlDatum.java
> (http://svn.apache.org/viewvc/nutch/branches/branch-1.3/src/java/org/apache/nutch/crawl/CrawlDatum.java?view=markup).
> It's another question whether the redirected URL's score should be just
> passed to the new URL or should the redirection be considered as a link in
> which case the new URL's score would be 'originalScore' / ('numberOfOutlinks'
> + 1).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira