Thanks for the quick reply. Is there anyway I can set this score for one specific site? As I said earlier, I crawl a handful of sites - 1 site has lot of search results as they have high scores (many internal links and possibly, anchor pollution) - other site pages does not have many incoming internal links and anchor text are either useless or empty.
At the end, I merge all the crawled segments into one for faster searching - won't the scores be recalculated here again? Setting the score for db.score.link.internal variable would then affect all sites. Won't it? Please correct me if I am wrong. Dennis Kubes-2 wrote: > > Well, the short answer is it doesn't Even if you set internal links to > be ignored they are still calculated in the OPIC scoring and this > negatively affects search relevancy. The way to handle this is to set > the db.score.link.internal variable to 0.0. This way only external > links are counted in OPIC. > > I will post a wiki entry about this process soon. > > Dennis Kubes > > karthik085 wrote: >> Hi, >> >> I was wondering how does db.ignore.internal.links affect rankings on >> PageRank and OPIC algiorithm? I searched on the forum - couldn't get a >> clear-cut answer. >> >> I am using Nutch 0.7.2 to crawl & index handful of sites. One site - has >> lot >> of pages and interlinks - around 1/3 of my total pages are from this site >> - >> hence, when I search for something and hit 'Show All Hits' - first 5-10 >> pages are from this site - before any results from other sites are shown. >> How will db.ignore.internal.links help in this case? >> >> Of course, I will have to recrawl with nutch-0.9 to use OPIC >> algorithm...:-( >> >> Thanks. > > -- View this message in context: http://www.nabble.com/db.ignore.internal.links-and-ranking-algorithms-tf4767180.html#a13636316 Sent from the Nutch - Dev mailing list archive at Nabble.com.
