Hi Karl,
I think people may want to index content aspx files, so treating them specially
may not be a good solution.
In our environment, aspx files are used to construct a web site that used
internally. In my understanding this one of the use cases of SharePoint. In our
case content of aspx files are fetched from a List. We can access content of
aspx files from List. They don't have html tags etc in it.
But I am not sure if this is common usage of aspx and Lists.
I was thinking some option like index only metadata that simple ignores
document it self.
By the way I checked some of skipped aspx files their sizes are not too big.
101 KB, 139 KB etc.
I suspect some other factor is triggering this. Also I am seeing this weird
warning on jetty that runs solr.
WARN:oejh.HttpParser:Full [1771440721,-1,m=5,g=6144,p=6144,c=6144]={2F73
Thanks,
Ahmet
--- On Mon, 1/14/13, Karl Wright <[email protected]> wrote:
> From: Karl Wright <[email protected]>
> Subject: Re: Repeated service interruptions - failure processing document:
> null
> To: [email protected]
> Date: Monday, January 14, 2013, 6:46 PM
> Hi Ahmet,
>
> We could specifically treat .aspx files specially, so that
> they are
> considered to never have any content. But are there
> cases where
> someone might want to index any content that these URLs
> might return?
> Specifically, what do .aspx "files" typically contain, when
> found in a
> SharePoint hierarchy?
>
> Karl
>
> On Mon, Jan 14, 2013 at 11:37 AM, Ahmet Arslan <[email protected]>
> wrote:
> > Hi Karl,
> >
> > Now 39 aspx files (out of 130) are indexed. Job didn't
> get killed. No exceptions in the log.
> >
> > I increased the maximum POST size of solr/jetty but
> that 39 number didn't increased.
> >
> > I will check the size of remaining 130 - 39 *.aspx
> files.
> >
> > Actually I am mapping extracted content of this aspx
> files to a ignored dynamic field.
> (fmap.content=content_ignored) I don't use them. I am only
> interested in metadata of these aspx files. It would be
> great if there is a setting to just grab metadata.
> Similar to Lists.
> >
> > Thanks,
> > Ahmet
> >
> > --- On Mon, 1/14/13, Karl Wright <[email protected]>
> wrote:
> >
> >> From: Karl Wright <[email protected]>
> >> Subject: Re: Repeated service interruptions -
> failure processing document: null
> >> To: [email protected]
> >> Date: Monday, January 14, 2013, 5:46 PM
> >> I checked in a fix for this ticket on
> >> trunk. Please let me know if it
> >> resolves this issue.
> >>
> >> Karl
> >>
> >> On Mon, Jan 14, 2013 at 10:20 AM, Karl Wright
> <[email protected]>
> >> wrote:
> >> > This is because httpclient is retrying on
> error for
> >> three times by
> >> > default. This has to be disabled in the
> Solr
> >> connector, or the rest
> >> > of the logic won't work right.
> >> >
> >> > I've opened a ticket (CONNECTORS-610) for this
> problem
> >> too.
> >> >
> >> > Karl
> >> >
> >> > On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan
> <[email protected]>
> >> wrote:
> >> >> Hi Karl,
> >> >>
> >> >> Thanks for quick fix.
> >> >>
> >> >> I am still seeing the following error
> after 'svn
> >> up' and 'ant build'
> >> >>
> >> >> ERROR 2013-01-14 17:09:41,949 (Worker
> thread '6') -
> >> Exception tossed: Repeated service interruptions -
> failure
> >> processing document: null
> >> >>
> >>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> >> Repeated service interruptions - failure
> processing
> >> document: null
> >> >> at
> >>
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> >> >> Caused by:
> >> org.apache.http.client.ClientProtocolException
> >> >> at
> >>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
> >> >> at
> >>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> >> >> at
> >>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> >> >> at
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> >> >> at
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >> >> at
> >>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> >> >> at
> >>
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
> >> >> Caused by:
> >>
> org.apache.http.client.NonRepeatableRequestException:
> Cannot
> >> retry request with a non-repeatable request
> entity.
> >> The cause lists the reason the original request
> failed.
> >> >> at
> >>
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
> >> >> at
> >>
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
> >> >> at
> >>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> >> >> ...
> 6 more
> >> >> Caused by: java.net.SocketException:
> Broken pipe
> >> >> at
> >> java.net.SocketOutputStream.socketWrite0(Native
> Method)
> >> >> at
> >>
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> >> >> at
> >>
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> >> >> at
> >>
> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
> >> >> at
> >>
> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
> >> >> at
> >>
> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
> >> >> at
> >>
> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
> >> >> at
> >>
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
> >> >> at
> >>
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
> >> >> at
> >>
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
> >> >> at
> >>
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
> >> >> at
> >>
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
> >> >> at
> >>
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
> >> >> at
> >>
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> >> >> at
> >>
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
> >> >> ...
> 8 more
> >> >>
> >> >>
> >> >>
> >> >> --- On Mon, 1/14/13, Karl Wright <[email protected]>
> >> wrote:
> >> >>
> >> >>> From: Karl Wright <[email protected]>
> >> >>> Subject: Re: Repeated service
> interruptions -
> >> failure processing document: null
> >> >>> To: [email protected]
> >> >>> Date: Monday, January 14, 2013, 3:30
> PM
> >> >>> Hi Ahmet,
> >> >>>
> >> >>> The exception that seems to be causing
> the
> >> abort is a socket
> >> >>> exception
> >> >>> coming from a socket write:
> >> >>>
> >> >>> > Caused by:
> java.net.SocketException:
> >> Broken pipe
> >> >>>
> >> >>> This makes sense in light of the http
> code
> >> returned from
> >> >>> Solr, which
> >> >>> was 413: http://www.checkupdown.com/status/E413.html .
> >> >>>
> >> >>> So there is nothing actually *wrong*
> with the
> >> .aspx
> >> >>> documents, but
> >> >>> they are just way too big, and Solr
> is
> >> rejecting them for
> >> >>> that reason.
> >> >>>
> >> >>> Clearly, though, the Solr connector
> should
> >> recognize this
> >> >>> code as
> >> >>> meaning "never retry", so instead of
> killing
> >> the job, it
> >> >>> should just
> >> >>> skip the document. I'll open a
> ticket for
> >> that now.
> >> >>>
> >> >>> Karl
> >> >>>
> >> >>>
> >> >>> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet
> Arslan
> >> <[email protected]>
> >> >>> wrote:
> >> >>> > Hello,
> >> >>> >
> >> >>> > I am indexing a SharePoint 2010
> instance
> >> using
> >> >>> mcf-trunk (At revision 1432907)
> >> >>> >
> >> >>> > There is no problem with a
> Document
> >> library that
> >> >>> contains word excel etc.
> >> >>> >
> >> >>> > However, I receive the following
> errors
> >> with a Document
> >> >>> library that has *.aspx files in it.
> >> >>> >
> >> >>> > Status of Jobs => Error:
> Repeated
> >> service
> >> >>> interruptions - failure processing
> document:
> >> null
> >> >>> >
> >> >>> > WARN 2013-01-14
> 15:00:12,720 (Worker
> >> thread '13')
> >> >>> - Service interruption reported for
> job
> >> 1358009105156
> >> >>> connection 'iknow': IO exception
> during
> >> indexing: null
> >> >>> > ERROR 2013-01-14 15:00:12,763
> (Worker
> >> thread '13') -
> >> >>> Exception tossed: Repeated service
> >> interruptions - failure
> >> >>> processing document: null
> >> >>> >
> >> >>>
> >>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> >> >>> Repeated service interruptions -
> failure
> >> processing
> >> >>> document: null
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> >> >>> > Caused by:
> >> >>>
> org.apache.http.client.ClientProtocolException
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
> >> >>> > Caused by:
> >> >>>
> >>
> org.apache.http.client.NonRepeatableRequestException:
> >> Cannot
> >> >>> retry request with a non-repeatable
> request
> >> entity.
> >> >>> The cause lists the reason the
> original request
> >> failed.
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> >> >>> >
> ...
> >> 6 more
> >> >>> > Caused by:
> java.net.SocketException:
> >> Broken pipe
> >> >>> >
> at
> >> >>>
> java.net.SocketOutputStream.socketWrite0(Native
> >> Method)
> >> >>> >
> at
> >> >>>
> >>
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> >> >>> >
> at
> >> >>>
> >>
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
> >> >>> >
> ...
> >> 8 more
> >> >>> >
> >> >>> > Status of Jobs => Error:
> Unhandled Solr
> >> exception
> >> >>> during indexing (0): Server at http://localhost:8983/solr/all returned
> >> >>> non ok
> >> >>> status:413, message:FULL head
> >> >>> >
> >> >>> >
> >> ERROR 2013-01-14
> >> >>> 15:10:42,074 (Worker thread '15') -
> Exception
> >> tossed:
> >> >>> Unhandled Solr exception during
> indexing (0):
> >> Server at http://localhost:8983/solr/all returned
> >> non ok
> >> >>> status:413, message:FULL head
> >> >>> >
> >> >>>
> >>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> >> >>> Unhandled Solr exception during
> indexing (0):
> >> Server at http://localhost:8983/solr/all returned
> >> non ok
> >> >>> status:413, message:FULL head
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
> >> >>> >
> at
> >> >>>
> >>
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
> >> >>> >
> >> >>> > On the solr side I see :
> >> >>> >
> >> >>> > INFO: Creating new http client,
> >> >>>
> >>
> config:maxConnections=200&maxConnectionsPerHost=8
> >> >>> > 2013-01-14
> >> 15:18:21.775:WARN:oejh.HttpParser:Full
> >> >>>
> >>
> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
> >> >>> ...long long chars ... 2B656B6970{}
> >> >>> >
> >> >>> > Thanks,
> >> >>> > Ahmet
> >> >>>
> >>
>