Hi Karl, Now 39 aspx files (out of 130) are indexed. Job didn't get killed. No exceptions in the log.
I increased the maximum POST size of solr/jetty but that 39 number didn't increased. I will check the size of remaining 130 - 39 *.aspx files. Actually I am mapping extracted content of this aspx files to a ignored dynamic field. (fmap.content=content_ignored) I don't use them. I am only interested in metadata of these aspx files. It would be great if there is a setting to just grab metadata. Similar to Lists. Thanks, Ahmet --- On Mon, 1/14/13, Karl Wright <[email protected]> wrote: > From: Karl Wright <[email protected]> > Subject: Re: Repeated service interruptions - failure processing document: > null > To: [email protected] > Date: Monday, January 14, 2013, 5:46 PM > I checked in a fix for this ticket on > trunk. Please let me know if it > resolves this issue. > > Karl > > On Mon, Jan 14, 2013 at 10:20 AM, Karl Wright <[email protected]> > wrote: > > This is because httpclient is retrying on error for > three times by > > default. This has to be disabled in the Solr > connector, or the rest > > of the logic won't work right. > > > > I've opened a ticket (CONNECTORS-610) for this problem > too. > > > > Karl > > > > On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan <[email protected]> > wrote: > >> Hi Karl, > >> > >> Thanks for quick fix. > >> > >> I am still seeing the following error after 'svn > up' and 'ant build' > >> > >> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') - > Exception tossed: Repeated service interruptions - failure > processing document: null > >> > org.apache.manifoldcf.core.interfaces.ManifoldCFException: > Repeated service interruptions - failure processing > document: null > >> at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585) > >> Caused by: > org.apache.http.client.ClientProtocolException > >> at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) > >> at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) > >> at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) > >> at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) > >> at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > >> at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > >> at > org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790) > >> Caused by: > org.apache.http.client.NonRepeatableRequestException: Cannot > retry request with a non-repeatable request entity. > The cause lists the reason the original request failed. > >> at > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692) > >> at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523) > >> at > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > >> ... 6 more > >> Caused by: java.net.SocketException: Broken pipe > >> at > java.net.SocketOutputStream.socketWrite0(Native Method) > >> at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > >> at > java.net.SocketOutputStream.write(SocketOutputStream.java:136) > >> at > org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169) > >> at > org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110) > >> at > org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165) > >> at > org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92) > >> at > org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) > >> at > org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) > >> at > org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) > >> at > org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) > >> at > org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197) > >> at > org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257) > >> at > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > >> at > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718) > >> ... 8 more > >> > >> > >> > >> --- On Mon, 1/14/13, Karl Wright <[email protected]> > wrote: > >> > >>> From: Karl Wright <[email protected]> > >>> Subject: Re: Repeated service interruptions - > failure processing document: null > >>> To: [email protected] > >>> Date: Monday, January 14, 2013, 3:30 PM > >>> Hi Ahmet, > >>> > >>> The exception that seems to be causing the > abort is a socket > >>> exception > >>> coming from a socket write: > >>> > >>> > Caused by: java.net.SocketException: > Broken pipe > >>> > >>> This makes sense in light of the http code > returned from > >>> Solr, which > >>> was 413: http://www.checkupdown.com/status/E413.html . > >>> > >>> So there is nothing actually *wrong* with the > .aspx > >>> documents, but > >>> they are just way too big, and Solr is > rejecting them for > >>> that reason. > >>> > >>> Clearly, though, the Solr connector should > recognize this > >>> code as > >>> meaning "never retry", so instead of killing > the job, it > >>> should just > >>> skip the document. I'll open a ticket for > that now. > >>> > >>> Karl > >>> > >>> > >>> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan > <[email protected]> > >>> wrote: > >>> > Hello, > >>> > > >>> > I am indexing a SharePoint 2010 instance > using > >>> mcf-trunk (At revision 1432907) > >>> > > >>> > There is no problem with a Document > library that > >>> contains word excel etc. > >>> > > >>> > However, I receive the following errors > with a Document > >>> library that has *.aspx files in it. > >>> > > >>> > Status of Jobs => Error: Repeated > service > >>> interruptions - failure processing document: > null > >>> > > >>> > WARN 2013-01-14 15:00:12,720 (Worker > thread '13') > >>> - Service interruption reported for job > 1358009105156 > >>> connection 'iknow': IO exception during > indexing: null > >>> > ERROR 2013-01-14 15:00:12,763 (Worker > thread '13') - > >>> Exception tossed: Repeated service > interruptions - failure > >>> processing document: null > >>> > > >>> > org.apache.manifoldcf.core.interfaces.ManifoldCFException: > >>> Repeated service interruptions - failure > processing > >>> document: null > >>> > at > >>> > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585) > >>> > Caused by: > >>> org.apache.http.client.ClientProtocolException > >>> > at > >>> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) > >>> > at > >>> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) > >>> > at > >>> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) > >>> > at > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) > >>> > at > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > >>> > at > >>> > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > >>> > at > >>> > org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768) > >>> > Caused by: > >>> > org.apache.http.client.NonRepeatableRequestException: > Cannot > >>> retry request with a non-repeatable request > entity. > >>> The cause lists the reason the original request > failed. > >>> > at > >>> > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692) > >>> > at > >>> > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523) > >>> > at > >>> > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > >>> > ... > 6 more > >>> > Caused by: java.net.SocketException: > Broken pipe > >>> > at > >>> java.net.SocketOutputStream.socketWrite0(Native > Method) > >>> > at > >>> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > >>> > at > >>> > java.net.SocketOutputStream.write(SocketOutputStream.java:136) > >>> > at > >>> > org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169) > >>> > at > >>> > org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110) > >>> > at > >>> > org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165) > >>> > at > >>> > org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92) > >>> > at > >>> > org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) > >>> > at > >>> > org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) > >>> > at > >>> > org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) > >>> > at > >>> > org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) > >>> > at > >>> > org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197) > >>> > at > >>> > org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257) > >>> > at > >>> > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > >>> > at > >>> > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718) > >>> > ... > 8 more > >>> > > >>> > Status of Jobs => Error: Unhandled Solr > exception > >>> during indexing (0): Server at http://localhost:8983/solr/all returned > >>> non ok > >>> status:413, message:FULL head > >>> > > >>> > > ERROR 2013-01-14 > >>> 15:10:42,074 (Worker thread '15') - Exception > tossed: > >>> Unhandled Solr exception during indexing (0): > Server at http://localhost:8983/solr/all returned > non ok > >>> status:413, message:FULL head > >>> > > >>> > org.apache.manifoldcf.core.interfaces.ManifoldCFException: > >>> Unhandled Solr exception during indexing (0): > Server at http://localhost:8983/solr/all returned > non ok > >>> status:413, message:FULL head > >>> > at > >>> > org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360) > >>> > at > >>> > org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477) > >>> > at > >>> > org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594) > >>> > at > >>> > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579) > >>> > at > >>> > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504) > >>> > at > >>> > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370) > >>> > at > >>> > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652) > >>> > at > >>> > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559) > >>> > at > >>> > org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) > >>> > at > >>> > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551) > >>> > > >>> > On the solr side I see : > >>> > > >>> > INFO: Creating new http client, > >>> > config:maxConnections=200&maxConnectionsPerHost=8 > >>> > 2013-01-14 > 15:18:21.775:WARN:oejh.HttpParser:Full > >>> > [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616 > >>> ...long long chars ... 2B656B6970{} > >>> > > >>> > Thanks, > >>> > Ahmet > >>> >
