Can you please be more specific about your environment and what you have found to be out of date please?
On Aug 1, 2017 5:28 PM, "Michael Chen" <[email protected]> wrote: > Problem resolved. The crawl script and web documentation are out of date. > Nutch script works fine. > > Might be a good idea to update sitemap related documentation at some > point... takes quite a bit of speculation and experimentation right now... > > Thanks! > > Michael > > > On 07/31/2017 12:21 PM, Michael Chen wrote: > >> Dear fellow Nutch developers, >> >> I've been trying to use Nutch 2 sitemap function to crawl and index all >> pages on the sitemap indices. It seems that integration with CommonCrawler >> sitemap tools only exist in 2.x branch. But after I got it to work with >> Hbase 1.2.3, it didn't fetch, parse and index the sitemap indices and >> sitemaps at all. >> >> I also looked into the code a bit and everything seems to make sense, >> except I couldn't further trace the data flow beyond Toolrunner.run() in >> the FetchReducer. I'm testing it on Linux with the "crawl" script in /bin, >> so I'm not sure if how I can debug this. Please let me know if there's any >> further information that I can provide you with to help troubleshoot this >> issue. Thanks in advance! >> >> Best regards, >> >> Michael >> >> >> >

