Hi Karl,

Now 39 aspx files (out of 130) are indexed. Job didn't get killed. No 
exceptions in the log.

I increased the maximum POST size of solr/jetty but that 39 number didn't 
increased. 

I will check the size of remaining 130 - 39 *.aspx files.

Actually I am mapping extracted content of this aspx files to a ignored dynamic 
field. (fmap.content=content_ignored) I don't use them. I am only interested in 
metadata of these aspx files. It would be great if there is a setting  to just 
grab metadata. Similar to Lists.

Thanks,
Ahmet

--- On Mon, 1/14/13, Karl Wright <[email protected]> wrote:

> From: Karl Wright <[email protected]>
> Subject: Re: Repeated service interruptions - failure processing document: 
> null
> To: [email protected]
> Date: Monday, January 14, 2013, 5:46 PM
> I checked in a fix for this ticket on
> trunk.  Please let me know if it
> resolves this issue.
> 
> Karl
> 
> On Mon, Jan 14, 2013 at 10:20 AM, Karl Wright <[email protected]>
> wrote:
> > This is because httpclient is retrying on error for
> three times by
> > default.  This has to be disabled in the Solr
> connector, or the rest
> > of the logic won't work right.
> >
> > I've opened a ticket (CONNECTORS-610) for this problem
> too.
> >
> > Karl
> >
> > On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan <[email protected]>
> wrote:
> >> Hi Karl,
> >>
> >> Thanks for quick fix.
> >>
> >> I am still seeing the following error after 'svn
> up' and 'ant build'
> >>
> >> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') -
> Exception tossed: Repeated service interruptions - failure
> processing document: null
> >>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> Repeated service interruptions - failure processing
> document: null
> >>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> >> Caused by:
> org.apache.http.client.ClientProtocolException
> >>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
> >>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> >>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> >>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> >>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >>         at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> >>         at
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
> >> Caused by:
> org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity. 
> The cause lists the reason the original request failed.
> >>         at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
> >>         at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
> >>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> >>         ... 6 more
> >> Caused by: java.net.SocketException: Broken pipe
> >>         at
> java.net.SocketOutputStream.socketWrite0(Native Method)
> >>         at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> >>         at
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> >>         at
> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
> >>         at
> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
> >>         at
> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
> >>         at
> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
> >>         at
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
> >>         at
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
> >>         at
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
> >>         at
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
> >>         at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
> >>         at
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
> >>         at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> >>         at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
> >>         ... 8 more
> >>
> >>
> >>
> >> --- On Mon, 1/14/13, Karl Wright <[email protected]>
> wrote:
> >>
> >>> From: Karl Wright <[email protected]>
> >>> Subject: Re: Repeated service interruptions -
> failure processing document: null
> >>> To: [email protected]
> >>> Date: Monday, January 14, 2013, 3:30 PM
> >>> Hi Ahmet,
> >>>
> >>> The exception that seems to be causing the
> abort is a socket
> >>> exception
> >>> coming from a socket write:
> >>>
> >>> > Caused by: java.net.SocketException:
> Broken pipe
> >>>
> >>> This makes sense in light of the http code
> returned from
> >>> Solr, which
> >>> was 413:  http://www.checkupdown.com/status/E413.html .
> >>>
> >>> So there is nothing actually *wrong* with the
> .aspx
> >>> documents, but
> >>> they are just way too big, and Solr is
> rejecting them for
> >>> that reason.
> >>>
> >>> Clearly, though, the Solr connector should
> recognize this
> >>> code as
> >>> meaning "never retry", so instead of killing
> the job, it
> >>> should just
> >>> skip the document.  I'll open a ticket for
> that now.
> >>>
> >>> Karl
> >>>
> >>>
> >>> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan
> <[email protected]>
> >>> wrote:
> >>> > Hello,
> >>> >
> >>> > I am indexing a SharePoint 2010 instance
> using
> >>> mcf-trunk (At revision 1432907)
> >>> >
> >>> > There is no problem with a Document
> library that
> >>> contains word excel etc.
> >>> >
> >>> > However, I receive the following errors
> with a Document
> >>> library that has *.aspx files in it.
> >>> >
> >>> > Status of Jobs => Error: Repeated
> service
> >>> interruptions - failure processing document:
> null
> >>> >
> >>> >  WARN 2013-01-14 15:00:12,720 (Worker
> thread '13')
> >>> - Service interruption reported for job
> 1358009105156
> >>> connection 'iknow': IO exception during
> indexing: null
> >>> > ERROR 2013-01-14 15:00:12,763 (Worker
> thread '13') -
> >>> Exception tossed: Repeated service
> interruptions - failure
> >>> processing document: null
> >>> >
> >>>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> >>> Repeated service interruptions - failure
> processing
> >>> document: null
> >>> >         at
> >>>
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> >>> > Caused by:
> >>> org.apache.http.client.ClientProtocolException
> >>> >         at
> >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
> >>> >         at
> >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> >>> >         at
> >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> >>> >         at
> >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> >>> >         at
> >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >>> >         at
> >>>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
> >>> > Caused by:
> >>>
> org.apache.http.client.NonRepeatableRequestException:
> Cannot
> >>> retry request with a non-repeatable request
> entity.
> >>> The cause lists the reason the original request
> failed.
> >>> >         at
> >>>
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
> >>> >         at
> >>>
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
> >>> >         at
> >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> >>> >         ...
> 6 more
> >>> > Caused by: java.net.SocketException:
> Broken pipe
> >>> >         at
> >>> java.net.SocketOutputStream.socketWrite0(Native
> Method)
> >>> >         at
> >>>
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> >>> >         at
> >>>
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> >>> >         at
> >>>
> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
> >>> >         at
> >>>
> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
> >>> >         at
> >>>
> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
> >>> >         at
> >>>
> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
> >>> >         at
> >>>
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
> >>> >         at
> >>>
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
> >>> >         at
> >>>
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
> >>> >         at
> >>>
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
> >>> >         at
> >>>
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
> >>> >         at
> >>>
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
> >>> >         at
> >>>
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> >>> >         at
> >>>
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
> >>> >         ...
> 8 more
> >>> >
> >>> > Status of Jobs => Error: Unhandled Solr
> exception
> >>> during indexing (0): Server at http://localhost:8983/solr/all returned 
> >>> non ok
> >>> status:413, message:FULL head
> >>> >
> >>> >     
>    ERROR 2013-01-14
> >>> 15:10:42,074 (Worker thread '15') - Exception
> tossed:
> >>> Unhandled Solr exception during indexing (0):
> Server at http://localhost:8983/solr/all returned
> non ok
> >>> status:413, message:FULL head
> >>> >
> >>>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> >>> Unhandled Solr exception during indexing (0):
> Server at http://localhost:8983/solr/all returned
> non ok
> >>> status:413, message:FULL head
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
> >>> >         at
> >>>
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
> >>> >         at
> >>>
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
> >>> >         at
> >>>
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
> >>> >         at
> >>>
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
> >>> >
> >>> > On the solr side I see :
> >>> >
> >>> > INFO: Creating new http client,
> >>>
> config:maxConnections=200&maxConnectionsPerHost=8
> >>> > 2013-01-14
> 15:18:21.775:WARN:oejh.HttpParser:Full
> >>>
> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
> >>> ...long long chars ... 2B656B6970{}
> >>> >
> >>> > Thanks,
> >>> > Ahmet
> >>>
>

Reply via email to