[ 
https://issues.apache.org/jira/browse/NUTCH-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376227#comment-14376227
 ] 

Sebastian Nagel commented on NUTCH-1958:
----------------------------------------

Scoring-oping is not that bad, scores are plausible also for smaller site 
crawls. An option would be to finally fix our OPIC implementation, so that 
scores do not get out of control for long-running incremental crawls. This 
should be possible by keeping cash and score used for indexing separate. A 
challenge worth to take since the problem is known for long and some 
considerations are done ([[1|http://wiki.apache.org/nutch/FixingOpicScoring]]).

> Remove scoring-opic from nutch-default.xml
> ------------------------------------------
>
>                 Key: NUTCH-1958
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1958
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 2.3, 1.9
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 2.4, 1.10
>
>
> I propose we remove scoring-opic from nutch-default. We all know it is flawed 
> for any kind of incremental crawl, which most of us do. It is also useless if 
> you want to perform a single crawl, if you must crawl all records of a 
> domain, using OPIC for prioritizing URLS makes no sense. It also confuses 
> users as we have seen in the past and recently [1].
> What do you think?
> [1]: 
> http://lucene.472066.n3.nabble.com/Nutch-documents-have-huge-scores-in-Solr-td4192064.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to