Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "GoogleSummerOfCode/SitemapCrawler/weeklyreport" page has been changed by CihadGuzel: https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler/weeklyreport?action=diff&rev1=6&rev2=7 Comment: Added week 3&4 report Robot.txt file is checked while fetcher job is run. If robot.txt file have any sitemap urls, these are written to database. A column called sitemap(stm) for sitemap is added to db schema. The urls in stm column from db will be parsed at the next time. - || '''Week :''' 3 (8 June 2015 - 21 June 2015) || + || '''Week :''' 3 & 4 (8 June 2015 - 21 June 2015) || '''Title :''' Sitemap parser plugin is developed. A plugin to parse sitemap file is developed. The plugin make use of crawler commons library. The sitemap file is parsed by the parse plugin. Inlinks from sitemap file is written to db. The inlinks will be parsed at the next time. - || '''Week :''' 4 (22 June 2015 - 28 June 2015) || + || '''Week :''' 5 (22 June 2015 - 28 June 2015) || + ... - '''Title :''' - - ---- - Example: -

