[jira] [Commented] (NUTCH-1958) Remove scoring-opic from nutch-default.xml

Julien Nioche (JIRA) Mon, 23 Mar 2015 05:43:29 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375857#comment-14375857
 ]


Julien Nioche commented on NUTCH-1958:
--------------------------------------

I agree but I think there could be benefits in using depth as a default score. 
The main one is that people often get confused between crawl iteration number 
and depth, making the depth explicit via the score would be a good debugging / 
educational step. 

It is a default value and people will override it and remove it altogether. Not 
having a default value is certainly OK but having one is better in the sense 
that it helps users realise that there is something there that the can use (nor 
not).

Am happy with not having a default value BTW, just thinking aloud here. Thanks!


> Remove scoring-opic from nutch-default.xml
> ------------------------------------------
>
>                 Key: NUTCH-1958
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1958
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 2.3, 1.9
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 2.4, 1.10
>
>
> I propose we remove scoring-opic from nutch-default. We all know it is flawed 
> for any kind of incremental crawl, which most of us do. It is also useless if 
> you want to perform a single crawl, if you must crawl all records of a 
> domain, using OPIC for prioritizing URLS makes no sense. It also confuses 
> users as we have seen in the past and recently [1].
> What do you think?
> [1]: 
> http://lucene.472066.n3.nabble.com/Nutch-documents-have-huge-scores-in-Solr-td4192064.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-1958) Remove scoring-opic from nutch-default.xml

Reply via email to