Re: [ManifoldCF] Crawling with the WEB repository connector causes Repeated service interruptions

2012-03-18 Thread Shigeki Kobayashi
Karl, Thanks for your reply. It seems that Tika failed on extracting documents from PDF files while crawling web links down. I confirmed there were Tika Exception subsequently to Solr Exception. So, Solr detecting Tika Exception sends a status code, 500, then MCF retries ingesting certain

Re: [ManifoldCF] Crawling with the WEB repository connector causes Repeated service interruptions

2012-03-18 Thread Shigeki Kobayashi
Abe-san, Thank you for the info. That's a good idea. Hope I can avoid the job interruption in this way. Regards, Shigeki 2012/3/19 Shinichiro Abe shinichiro.ab...@gmail.com Hi, Currently MCF can't ignore 500 server error which is caused by Solr. If you can upgrade to Solr 3.2, you can

Re: [ManifoldCF] Crawling with the WEB repository connector causes Repeated service interruptions

2012-03-16 Thread Karl Wright
Hi Shigeki, A service interruption means that a connector (either a repository connector like the web connector or an output connector like the Solr connector) could not communicate with the configured service. Repeated service interruptions means that certain URLs failed to fetch properly even