Is there some place to tell why the crawler has rejected a page? I'm trying
to get 1.1 working and basically it doesn't seem to crawl the same way that
1.0 does.

I have tika included in the parse- section of conf/nutch-site.xml

I have DEBUG set for all the crawl sections, but it doesn't really say why
it's rejecting a site.

I have the crawler set to not follow external links and I seed the top level
of each site.

I'm just unclear on how to proceed to troubleshoot this.

Reply via email to