Hi all,

I have a question concerning updating a site's score in Nutch 1.2.

In org.apache.nutch.crawlCrawlDbReducer's reduce-method I found a call to 
        scfilters.updateDbScore((Text)key, oldSet ? old : null, result, 
linkList);

During debugging, I discovered that this method is executed in the 
org.apache.nutch.scoring.opic.OPICScoringFilter class.  The code for this 
method is the following:
        /** Increase the score by a sum of inlinked scores. */
  public void updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List 
inlinked) throws ScoringFilterException {
    float adjust = 0.0f;
    for (int i = 0; i < inlinked.size(); i++) {
      CrawlDatum linked = (CrawlDatum)inlinked.get(i);
      adjust += linked.getScore();
    }
    if (old == null) old = datum;
    datum.setScore(old.getScore() + adjust);
  }

To my understanding, this code would increase a sites score based on it's 
inlinks, every time a site is crawled. So even if neither the site has been 
modified, nor any new inlink was discovered, the sites score will increase.

Is my understanding of this mechanism correct? 
If so, could anyone explain to me <why a sites score is increased in any case? 
I would expect it to only change if either its content has changed, or a new 
inlink has been discovered.

Cheers
David



Reply via email to