Thanks for the quick reply. 

Is there anyway I can set this score for one specific site? As I said
earlier, I crawl a handful of sites - 1 site has lot of search results as
they have high scores (many internal links and possibly, anchor pollution) -
other site pages does not have many incoming internal links and anchor text
are either useless or empty.

At the end, I merge all the crawled segments into one for faster searching -
won't the scores be recalculated here again? Setting the score for
db.score.link.internal variable would then affect all sites. Won't it?

Please correct me if I am wrong.


Dennis Kubes-2 wrote:
> 
> Well, the short answer is it doesn't  Even if you set internal links to 
> be ignored they are still calculated in the OPIC scoring and this 
> negatively affects search relevancy.  The way to handle this is to set 
> the db.score.link.internal variable to 0.0.  This way only external 
> links are counted in OPIC.
> 
> I will post a wiki entry about this process soon.
> 
> Dennis Kubes
> 
> karthik085 wrote:
>> Hi,
>> 
>> I was wondering how does db.ignore.internal.links affect rankings on
>> PageRank and OPIC algiorithm?  I searched on the forum - couldn't get a
>> clear-cut answer.
>> 
>> I am using Nutch 0.7.2 to crawl & index handful of sites. One site - has
>> lot
>> of pages and interlinks - around 1/3 of my total pages are from this site
>> -
>> hence, when I search for something and hit 'Show All Hits' - first 5-10
>> pages are from this site - before any results from other sites are shown.
>> How will db.ignore.internal.links help in this case?
>> 
>> Of course, I will have to recrawl with nutch-0.9 to use OPIC
>> algorithm...:-(
>> 
>> Thanks.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/db.ignore.internal.links-and-ranking-algorithms-tf4767180.html#a13636316
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Reply via email to