Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "SitemapFeature" page has been changed by CihadGuzel: https://wiki.apache.org/nutch/SitemapFeature?action=diff&rev1=9&rev2=10 For more information on Sitemaps, see the official page of [[http://www.sitemaps.org/|Sitemap protocol]] == Steps to run == - For Nutch 1.x: + ==== For Nutch 1.x: ==== {{{ bin/nutch sitemap <crawldb> [-hostdb <hostdb>] [-sitemapUrls <sitemapUrls>] [-threads <threads>] [-force] [-noFilter] [-noNormalize] }}} @@ -31, +31 @@ '''-noFilter''' Turn off URLFilters on urls (optional) '''-noNormalize''' Turn off URLNormalizer on urls (optional) + + ---- + ==== For Nutch 2.x: ==== + Please follow [[https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler|here]]. + ---- == How Nutch processes Sitemap ? ==

