I was crawling web sites with links to html and pdf files on the provided
multiprocess-example agent for a few hours, then Simple History started
showing -104 result code with a message saying "Interrupted: Job no longer
active".

After the same error occurred repeatedly around 40 times, the job status
became "Aborting" and then ended up with "Error: Repeated service
interruptions
- failure processing document: Ingestion HTTP error code 500".

The job was interrupted and stopped.

Does anyone know what situation brings "Repeated service interruptions" and
has jobs stopped?
Also in what circumstance an error status code -104 occurs? What is the
meaning of the code -104?

If you have any ideas, please advise me on how to avoid this error.


I am using the followings:

Solr 1.4 (Extracting Request Handler is set)
ManifoldCF 0.4 (multiprocess-example)
- Repository connector: WEB
- Output connector: Solr
Tomcat 6.0.29
PostgreSQL 9.1.3


Here is MCF’s debug log right before the job was interrupted:

DEBUG 2012-03-15 20:04:16,325 (Worker thread '4') - WEB: Attempting to get
connection to http://xx.xx.xx.xx:80 (95697 ms)
DEBUG 2012-03-15 20:04:16,325 (Worker thread '4') - WEB: Waiting 3895 ms
before starting fetch on http://xx.xx.xx.xx:80
DEBUG 2012-03-15 20:04:20,221 (Worker thread '4') - WEB: Attempting to get
connection to http://xx.xx.xx.xx:80 (99593 ms)
DEBUG 2012-03-15 20:04:20,221 (Worker thread '4') - WEB: Successfully got
connection to http://xx.xx.xx.xx:80 (99593 ms)
DEBUG 2012-03-15 20:04:20,221 (Worker thread '4') - WEB: Waiting for an
HttpClient object
DEBUG 2012-03-15 20:04:20,221 (Worker thread '4') - WEB: Got an HttpClient
object after 0 ms.
DEBUG 2012-03-15 20:04:20,221 (Worker thread '4') - WEB: Get method for
'/xx/xx.pdf'
DEBUG 2012-03-15 20:04:20,222 (Worker thread '4') - WEB: For
http://xx.xx/xx/xx.pdf, setting virtual host to xx.xx
DEBUG 2012-03-15 20:04:20,315 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 128 ms.
DEBUG 2012-03-15 20:04:20,445 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:20,509 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:20,573 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:20,637 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:20,701 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:20,765 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:20,829 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:20,893 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:20,957 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:21,021 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:21,085 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:21,149 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:21,213 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
DEBUG 2012-03-15 20:04:21,277 (Worker thread '4') - WEB: Performing a read
wait on bin 'xx.xx' of 62 ms.
 INFO 2012-03-15 20:04:21,344 (Worker thread '4') - WEB: FETCH URL|
http://xx.xx/xx/xx.pdf|1331809460221+1122|-104|65536|org.apache.manifoldcf.core.interfaces.ManifoldCFException|Interrupted:
Job no longer active
DEBUG 2012-03-15 20:04:21,344 (Worker thread '4') - WEB: Fetch exception
for 'http://xx.xx/xx/xx.pdf'
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Interrupted: Job
no longer active
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledConnection.noteInterrupted(ThrottledFetcher.java:1735)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.getDocumentVersions(WebcrawlerConnector.java:743)
        at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:318)
Caused by: org.apache.manifoldcf.agents.interfaces.ServiceInterruption: Job
no longer active
        at
org.apache.manifoldcf.crawler.system.WorkerThread$VersionActivity.checkJobStillActive(WorkerThread.java:1223)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.DataCache.addData(DataCache.java:135)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.getDocumentVersions(WebcrawlerConnector.java:713)
        ... 1 more
 WARN 2012-03-15 20:04:21,345 (Worker thread '4') - Pre-ingest service
interruption reported for job 1331716457096 connection 'web': Job no longer
active
DEBUG 2012-03-15 20:04:23,871 (Job reset thread) - Stopped job 1331716457096
DEBUG 2012-03-15 20:04:24,236 (Job notification thread) - Found job
1331716457096 in need of notification

Reply via email to