If I modify my Path Rules to index only *.doc and *.docx files, I can re-index over and over without restarting anything. Everything works fine. It seems that there is a problem with non text extractable files.
/Documents/*.doc file include /Documents/*.docx file include --- On Tue, 8/14/12, Ahmet Arslan <[email protected]> wrote: > From: Ahmet Arslan <[email protected]> > Subject: Re: SharePoint: Error closing connection to file > To: [email protected] > Date: Tuesday, August 14, 2012, 5:20 AM > > Also after this, when i hit "View Repository Connection > Status" i get : > > Got an unknown remote exception accessing site - axis fault > = Server.userException, detail = > java.net.UnknownHostException: null > > I restart mcf, I get "Connection status: Connection working" > at "View Repository Connection Status" page. > > --- On Tue, 8/14/12, Ahmet Arslan <[email protected]> > wrote: > > > From: Ahmet Arslan <[email protected]> > > Subject: SharePoint: Error closing connection to file > > To: [email protected] > > Date: Tuesday, August 14, 2012, 5:18 AM > > Hello, > > > > Using solr output connector and SP2010 Repository > connector, > > I am indexing a document library named Documents. This > > library has some scanned pdf documents. Very First > crawl > > indexes all 91 docs. > > When I hit "Re-ingest all associated documents" and > start > > second crawl, I get : "Error: Unexpected jobqueue > status - > > record id 1344907007021, expecting active status, saw > 3" > > > > Here is the stack trace: > > When i look at > > http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf, > > it is an image (scanned) pdf. > > > > WARN 2012-08-14 05:13:22,068 (Worker thread '39') - > > SharePoint: Error closing connection to file > > 'http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf': > > Connection reset > > java.net.SocketException: Connection reset > > at > > > java.net.SocketInputStream.read(SocketInputStream.java:113) > > at > > > java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > > at > > > java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > > at > > > java.io.BufferedInputStream.read(BufferedInputStream.java:317) > > at > > > org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown > > Source) > > at > > > org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown > > Source) > > at > > > org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(Unknown > > Source) > > at > > > org.apache.commons.httpclient.ContentLengthInputStream.close(Unknown > > Source) > > at > > > java.io.FilterInputStream.close(FilterInputStream.java:155) > > at > > > org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(Unknown > > Source) > > at > > > org.apache.commons.httpclient.AutoCloseInputStream.close(Unknown > > Source) > > at > > > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1457) > > at > > > org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) > > at > > > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:549) > > DEBUG 2012-08-14 05:13:22,072 (Worker thread '42') - > > SharePoint: Path attribute name is null > > WARN 2012-08-14 05:13:22,081 (Worker thread '39') > - > > SharePoint: IOException thrown: Connection reset > > java.net.SocketException: Connection reset > > at > > > java.net.SocketInputStream.read(SocketInputStream.java:168) > > at > > > java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > > at > > > java.io.BufferedInputStream.read(BufferedInputStream.java:317) > > at > > > org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown > > Source) > > at > > > java.io.FilterInputStream.read(FilterInputStream.java:116) > > at > > > org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown > > Source) > > at > > > java.io.FilterInputStream.read(FilterInputStream.java:90) > > at > > > org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown > > Source) > > at > > > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1447) > > at > > > org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) > > at > > > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:549) > > WARN 2012-08-14 05:13:22,186 (Worker thread '39') > - Service > > interruption reported for job 1344906886879 connection > > 'SP2010': SharePoint is down attempting to read > > 'http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf', > > retrying: Connection reset > > ERROR 2012-08-14 05:13:22,230 (Worker thread '39') - > > Exception tossed: Unexpected jobqueue status - record > id > > 1344907007021, expecting active status, saw 3 > > > org.apache.manifoldcf.core.interfaces.ManifoldCFException: > > Unexpected jobqueue status - record id 1344907007021, > > expecting active status, saw 3 > > at > > > org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:711) > > at > > > org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:2435) > > at > > > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:745) > > >
