I checked in a fix for this ticket on trunk. Please let me know if it resolves this issue.
Karl On Mon, Jan 14, 2013 at 10:20 AM, Karl Wright <[email protected]> wrote: > This is because httpclient is retrying on error for three times by > default. This has to be disabled in the Solr connector, or the rest > of the logic won't work right. > > I've opened a ticket (CONNECTORS-610) for this problem too. > > Karl > > On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan <[email protected]> wrote: >> Hi Karl, >> >> Thanks for quick fix. >> >> I am still seeing the following error after 'svn up' and 'ant build' >> >> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') - Exception tossed: >> Repeated service interruptions - failure processing document: null >> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service >> interruptions - failure processing document: null >> at >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585) >> Caused by: org.apache.http.client.ClientProtocolException >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) >> at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) >> at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >> at >> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) >> at >> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790) >> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot >> retry request with a non-repeatable request entity. The cause lists the >> reason the original request failed. >> at >> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692) >> at >> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) >> ... 6 more >> Caused by: java.net.SocketException: Broken pipe >> at java.net.SocketOutputStream.socketWrite0(Native Method) >> at >> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) >> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >> at >> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169) >> at >> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110) >> at >> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165) >> at >> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92) >> at >> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) >> at >> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) >> at >> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) >> at >> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) >> at >> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197) >> at >> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257) >> at >> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) >> at >> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718) >> ... 8 more >> >> >> >> --- On Mon, 1/14/13, Karl Wright <[email protected]> wrote: >> >>> From: Karl Wright <[email protected]> >>> Subject: Re: Repeated service interruptions - failure processing document: >>> null >>> To: [email protected] >>> Date: Monday, January 14, 2013, 3:30 PM >>> Hi Ahmet, >>> >>> The exception that seems to be causing the abort is a socket >>> exception >>> coming from a socket write: >>> >>> > Caused by: java.net.SocketException: Broken pipe >>> >>> This makes sense in light of the http code returned from >>> Solr, which >>> was 413: http://www.checkupdown.com/status/E413.html . >>> >>> So there is nothing actually *wrong* with the .aspx >>> documents, but >>> they are just way too big, and Solr is rejecting them for >>> that reason. >>> >>> Clearly, though, the Solr connector should recognize this >>> code as >>> meaning "never retry", so instead of killing the job, it >>> should just >>> skip the document. I'll open a ticket for that now. >>> >>> Karl >>> >>> >>> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan <[email protected]> >>> wrote: >>> > Hello, >>> > >>> > I am indexing a SharePoint 2010 instance using >>> mcf-trunk (At revision 1432907) >>> > >>> > There is no problem with a Document library that >>> contains word excel etc. >>> > >>> > However, I receive the following errors with a Document >>> library that has *.aspx files in it. >>> > >>> > Status of Jobs => Error: Repeated service >>> interruptions - failure processing document: null >>> > >>> > WARN 2013-01-14 15:00:12,720 (Worker thread '13') >>> - Service interruption reported for job 1358009105156 >>> connection 'iknow': IO exception during indexing: null >>> > ERROR 2013-01-14 15:00:12,763 (Worker thread '13') - >>> Exception tossed: Repeated service interruptions - failure >>> processing document: null >>> > >>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: >>> Repeated service interruptions - failure processing >>> document: null >>> > at >>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585) >>> > Caused by: >>> org.apache.http.client.ClientProtocolException >>> > at >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) >>> > at >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) >>> > at >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) >>> > at >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) >>> > at >>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) >>> > at >>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) >>> > at >>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768) >>> > Caused by: >>> org.apache.http.client.NonRepeatableRequestException: Cannot >>> retry request with a non-repeatable request entity. >>> The cause lists the reason the original request failed. >>> > at >>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692) >>> > at >>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523) >>> > at >>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) >>> > ... 6 more >>> > Caused by: java.net.SocketException: Broken pipe >>> > at >>> java.net.SocketOutputStream.socketWrite0(Native Method) >>> > at >>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) >>> > at >>> java.net.SocketOutputStream.write(SocketOutputStream.java:136) >>> > at >>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169) >>> > at >>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110) >>> > at >>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165) >>> > at >>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92) >>> > at >>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) >>> > at >>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) >>> > at >>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) >>> > at >>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) >>> > at >>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197) >>> > at >>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257) >>> > at >>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) >>> > at >>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718) >>> > ... 8 more >>> > >>> > Status of Jobs => Error: Unhandled Solr exception >>> during indexing (0): Server at http://localhost:8983/solr/all returned non >>> ok >>> status:413, message:FULL head >>> > >>> > ERROR 2013-01-14 >>> 15:10:42,074 (Worker thread '15') - Exception tossed: >>> Unhandled Solr exception during indexing (0): Server at >>> http://localhost:8983/solr/all returned non ok >>> status:413, message:FULL head >>> > >>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: >>> Unhandled Solr exception during indexing (0): Server at >>> http://localhost:8983/solr/all returned non ok >>> status:413, message:FULL head >>> > at >>> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360) >>> > at >>> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477) >>> > at >>> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594) >>> > at >>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579) >>> > at >>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504) >>> > at >>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370) >>> > at >>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652) >>> > at >>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559) >>> > at >>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) >>> > at >>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551) >>> > >>> > On the solr side I see : >>> > >>> > INFO: Creating new http client, >>> config:maxConnections=200&maxConnectionsPerHost=8 >>> > 2013-01-14 15:18:21.775:WARN:oejh.HttpParser:Full >>> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616 >>> ...long long chars ... 2B656B6970{} >>> > >>> > Thanks, >>> > Ahmet >>>
