Can you please be more specific about your environment and what you have
found to be out of date please?

On Aug 1, 2017 5:28 PM, "Michael Chen" <[email protected]>
wrote:

> Problem resolved. The crawl script and web documentation are out of date.
> Nutch script works fine.
>
> Might be a good idea to update sitemap related documentation at some
> point... takes quite a bit of speculation and experimentation right now...
>
> Thanks!
>
> Michael
>
>
> On 07/31/2017 12:21 PM, Michael Chen wrote:
>
>> Dear fellow Nutch developers,
>>
>> I've been trying to use Nutch 2 sitemap function to crawl and index all
>> pages on the sitemap indices. It seems that integration with CommonCrawler
>> sitemap tools only exist in 2.x branch. But after I got it to work with
>> Hbase 1.2.3, it didn't fetch, parse and index the sitemap indices and
>> sitemaps at all.
>>
>> I also looked into the code a bit and everything seems to make sense,
>> except I couldn't further trace the data flow beyond Toolrunner.run() in
>> the FetchReducer. I'm testing it on Linux with the "crawl" script in /bin,
>> so I'm not sure if how I can debug this. Please let me know if there's any
>> further information that I can provide you with to help troubleshoot this
>> issue. Thanks in advance!
>>
>> Best regards,
>>
>> Michael
>>
>>
>>
>

Reply via email to