If I modify my Path Rules to index only *.doc and *.docx files, I can re-index 
over and over without restarting anything. Everything works fine.
It seems that there is a problem with non text extractable files.

/Documents/*.doc        file    include
/Documents/*.docx       file    include

--- On Tue, 8/14/12, Ahmet Arslan <[email protected]> wrote:

> From: Ahmet Arslan <[email protected]>
> Subject: Re: SharePoint: Error closing connection to file
> To: [email protected]
> Date: Tuesday, August 14, 2012, 5:20 AM
> 
> Also after this, when i hit "View Repository Connection
> Status" i get :
> 
> Got an unknown remote exception accessing site - axis fault
> = Server.userException, detail =
> java.net.UnknownHostException: null
> 
> I restart mcf, I get "Connection status: Connection working"
> at "View Repository Connection Status" page.
> 
> --- On Tue, 8/14/12, Ahmet Arslan <[email protected]>
> wrote:
> 
> > From: Ahmet Arslan <[email protected]>
> > Subject: SharePoint: Error closing connection to file
> > To: [email protected]
> > Date: Tuesday, August 14, 2012, 5:18 AM
> > Hello,
> > 
> > Using solr output connector and SP2010 Repository
> connector,
> > I am indexing a document library named Documents. This
> > library has some scanned pdf documents. Very First
> crawl
> > indexes all 91 docs.
> > When I hit "Re-ingest all associated documents" and
> start
> > second crawl, I get : "Error: Unexpected jobqueue
> status -
> > record id 1344907007021, expecting active status, saw
> 3"
> > 
> > Here is the stack trace:
> > When i look at 
> > http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf,
> > it is an image (scanned) pdf. 
> > 
> > WARN 2012-08-14 05:13:22,068 (Worker thread '39') -
> > SharePoint: Error closing connection to file 
> > 'http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf':
> > Connection reset
> > java.net.SocketException: Connection reset
> >     at
> >
> java.net.SocketInputStream.read(SocketInputStream.java:113)
> >     at
> >
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> >     at
> >
> java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> >     at
> >
> java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> >     at
> >
> org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown
> > Source)
> >     at
> >
> org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown
> > Source)
> >     at
> >
> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(Unknown
> > Source)
> >     at
> >
> org.apache.commons.httpclient.ContentLengthInputStream.close(Unknown
> > Source)
> >     at
> >
> java.io.FilterInputStream.close(FilterInputStream.java:155)
> >     at
> >
> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(Unknown
> > Source)
> >     at
> >
> org.apache.commons.httpclient.AutoCloseInputStream.close(Unknown
> > Source)
> >     at
> >
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1457)
> >     at
> >
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
> >     at
> >
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:549)
> > DEBUG 2012-08-14 05:13:22,072 (Worker thread '42') -
> > SharePoint: Path attribute name is null
> >  WARN 2012-08-14 05:13:22,081 (Worker thread '39')
> -
> > SharePoint: IOException thrown: Connection reset
> > java.net.SocketException: Connection reset
> >     at
> >
> java.net.SocketInputStream.read(SocketInputStream.java:168)
> >     at
> >
> java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> >     at
> >
> java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> >     at
> >
> org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown
> > Source)
> >     at
> >
> java.io.FilterInputStream.read(FilterInputStream.java:116)
> >     at
> >
> org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown
> > Source)
> >     at
> >
> java.io.FilterInputStream.read(FilterInputStream.java:90)
> >     at
> >
> org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown
> > Source)
> >     at
> >
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1447)
> >     at
> >
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
> >     at
> >
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:549)
> >  WARN 2012-08-14 05:13:22,186 (Worker thread '39')
> - Service
> > interruption reported for job 1344906886879 connection
> > 'SP2010': SharePoint is down attempting to read 
> > 'http://iknowtest/Documents/ik_docs/vize_evraklari/ticaret_sicil_gazetesi.pdf',
> > retrying: Connection reset
> > ERROR 2012-08-14 05:13:22,230 (Worker thread '39') -
> > Exception tossed: Unexpected jobqueue status - record
> id
> > 1344907007021, expecting active status, saw 3
> >
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> > Unexpected jobqueue status - record id 1344907007021,
> > expecting active status, saw 3
> >     at
> >
> org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:711)
> >     at
> >
> org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:2435)
> >     at
> >
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:745)
> >
>

Reply via email to