Hello, Using solr output connector and SP2010 Repository connector, I am indexing a document library named Documents. This library has some scanned pdf documents. Very First crawl indexes all 91 docs. When I hit "Re-ingest all associated documents" and start second crawl, I get : "Error: Unexpected jobqueue status - record id 1344907007021, expecting active status, saw 3"
Here is the stack trace: When i look at http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf, it is an image (scanned) pdf. WARN 2012-08-14 05:13:22,068 (Worker thread '39') - SharePoint: Error closing connection to file 'http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf': Connection reset java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:113) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown Source) at org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown Source) at org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(Unknown Source) at org.apache.commons.httpclient.ContentLengthInputStream.close(Unknown Source) at java.io.FilterInputStream.close(FilterInputStream.java:155) at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(Unknown Source) at org.apache.commons.httpclient.AutoCloseInputStream.close(Unknown Source) at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1457) at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:549) DEBUG 2012-08-14 05:13:22,072 (Worker thread '42') - SharePoint: Path attribute name is null WARN 2012-08-14 05:13:22,081 (Worker thread '39') - SharePoint: IOException thrown: Connection reset java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown Source) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown Source) at java.io.FilterInputStream.read(FilterInputStream.java:90) at org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown Source) at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1447) at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:549) WARN 2012-08-14 05:13:22,186 (Worker thread '39') - Service interruption reported for job 1344906886879 connection 'SP2010': SharePoint is down attempting to read 'http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf', retrying: Connection reset ERROR 2012-08-14 05:13:22,230 (Worker thread '39') - Exception tossed: Unexpected jobqueue status - record id 1344907007021, expecting active status, saw 3 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected jobqueue status - record id 1344907007021, expecting active status, saw 3 at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:711) at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:2435) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:745)
