Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NewScoring" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/NewScoring?action=diff&rev1=5&rev2=6 == Questions == - === If internal links are not ignored, would the !LinkRank scores be equivalent to !PageRank scores? === + === If internal links are not ignored, would the LinkRank scores be equivalent to PageRank scores? === To understand this we are required to explain how the !LinkRank scores are calculated exactly. @@ -62, +62 @@ }}} can be used to change that behavior. By default it ignores links from the same domain and hosts. So a link from news.google.com wouldn't be counted and wouldn't raise the score for www.google.com. The !WebGraph just builds the lists of inlinks, outlinks, and nodes then the !LinkRank class processes that to create the score. !LinkRank does follow very closely to the original pagerank formula which is something like: - '''(1 - !dampingFactor) + (!dampingFactor * !totalInlinkScore)''' + '''(1 - dampingFactor) + (dampingFactor * totalInlinkScore)''' - Where !totalInlinkScore is the calculated from all the inlinks pointing to a page, taking into account that this is iterative and pages all start off with !rankOne score which is (1 / !numLinksInWebGraph). + Where totalInlinkScore is the calculated from all the inlinks pointing to a page, taking into account that this is iterative and pages all start off with rankOne score which is (1 / numLinksInWebGraph). The differences are: @@ -99, +99 @@ 3. A link is a link, it is content agnostic. If you crawl 100m pages and do a !LinkRank on that you will see all the usual suspects (Google, YouTube, Facebook) but you will also see things like the - flash download. To LinkRank a link is a link, it isn't particular + flash download. To !LinkRank a link is a link, it isn't particular in it being a viewable piece of content.

