Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "GoogleSummerOfCode/SitemapCrawler" page has been changed by 
LewisJohnMcgibbney:
https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler?action=diff&rev1=1&rev2=2

+ <<TableOfContents(4)>>
+ 
  == Abstract ==
  
  The url’s can be got from only pages that were scanned before in nutch 
crawler system. This method is expensive. Also, the degrees of importance and 
“change frequance” of these urls are not known only guessed. But, it is 
possible to find the whole of urls in a up-to-date sitemap file. For this 
reason, sitemap files in website should be crawled. Nutch project will have 
that support of sitemap crawler thanks to this development.

Reply via email to