Hi all. You know I am working for NUTCH-1741 for GSOC 2015. I have very little time for the completion of final evolation for GSOC program. I want to talk briefly about the process.
My goal is to give support sitemap project. I have almost completed my work. I commit my code to my github account[1]. I attached the patch file to the issue[2]. Features developed at this stage are as follows: + sitemap files are crawled (inject, generate,fetch and parse) + if a host have any sitemap files, they are detected. + If desired, only sitemap can be crawled or only other (non sitemap urls) can be crawled. + It is activated with just one parameter (-sitemap). Please follow the wiki[3] and issue[2] for more information. Thanks for my mentors ( Lewis & Talat ) and for nutch community. [1] - https://github.com/cguzel/nutch-sitemapCrawler [2] - https://issues.apache.org/jira/browse/NUTCH-1741 [3] - https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler -- Kind regards Cihad Guzel

