Hello, I increased these settings of jetty/solr :
<Set type="java.lang.Integer" name="requestHeaderSize">147483647</Set> <Set type="java.lang.Integer" name="requestBufferSize">147483647</Set> Now I can index all 130 aspx files (will all metadata) with security disabled. Thanks, Ahmet --- On Mon, 1/14/13, Ahmet Arslan <[email protected]> wrote: > From: Ahmet Arslan <[email protected]> > Subject: Re: Repeated service interruptions - failure processing document: > null > To: [email protected] > Date: Monday, January 14, 2013, 11:19 PM > Hi Karl, > > I track down problem to this: > > A metadata is causing this. If I select only ID metadata > (Normally I select all these : Created, FileLeafRef, ID, > IKAccessGroup, IKContentType, IKDocuments, IKExpertise, > IKExplanation, IKFAQ, IKImportant, Modified, Title ) all > aspx files are indexed successfully. > > So contentStreamUpdateRequest.addContentStream(new > RepositoryDocumentStream(is,length)); part is not the > problem. > > I suspect one of the metadata field is very long (some > metadata fields have html tags in it.) Is there a limitation > on modifiable solr params? &literal.IKFAQ="very long > text that contains some html tags" > > I will investigate further. > > Thanks, > Ahmet > > --- On Mon, 1/14/13, Karl Wright <[email protected]> > wrote: > > > From: Karl Wright <[email protected]> > > Subject: Re: Repeated service interruptions - failure > processing document: null > > To: [email protected] > > Date: Monday, January 14, 2013, 10:34 PM > > Let's try to figure out why we can't > > index streamed data from these > > .aspx files. Can you add enough debugging output to > > figure out what > > the connector is actually trying to stream to Solr? > In > > order to do > > that you may well need to write a class that wraps the > input > > stream > > that is handed to Solr with one that outputs enough > > information for us > > to make sense of this. > > > > What might be happening might be that the content > length is > > missing or > > wrong, and as a result the transfer just keeps going > or > > something. > > > > Karl > > > > On Mon, Jan 14, 2013 at 3:23 PM, Ahmet Arslan <[email protected]> > > wrote: > > > Hi Karl, > > > > > > I think people may want to index content aspx > files, so > > treating them specially may not be a good solution. > > > > > > In our environment, aspx files are used to > construct a > > web site that used internally. In my understanding this > one > > of the use cases of SharePoint. In our case content of > aspx > > files are fetched from a List. We can access content of > aspx > > files from List. They don't have html tags etc in it. > > > > > > But I am not sure if this is common usage of aspx > and > > Lists. > > > > > > > > > I was thinking some option like index only > metadata > > that simple ignores document it self. > > > > > > By the way I checked some of skipped aspx files > their > > sizes are not too big. 101 KB, 139 KB etc. > > > > > > I suspect some other factor is triggering this. > Also I > > am seeing this weird warning on jetty that runs solr. > > > > > > WARN:oejh.HttpParser:Full > > [1771440721,-1,m=5,g=6144,p=6144,c=6144]={2F73 > > > > > > Thanks, > > > Ahmet > > > > > > --- On Mon, 1/14/13, Karl Wright <[email protected]> > > wrote: > > > > > >> From: Karl Wright <[email protected]> > > >> Subject: Re: Repeated service interruptions - > > failure processing document: null > > >> To: [email protected] > > >> Date: Monday, January 14, 2013, 6:46 PM > > >> Hi Ahmet, > > >> > > >> We could specifically treat .aspx files > specially, > > so that > > >> they are > > >> considered to never have any content. But > are > > there > > >> cases where > > >> someone might want to index any content that > these > > URLs > > >> might return? > > >> Specifically, what do .aspx "files" typically > > contain, when > > >> found in a > > >> SharePoint hierarchy? > > >> > > >> Karl > > >> > > >> On Mon, Jan 14, 2013 at 11:37 AM, Ahmet > Arslan > > <[email protected]> > > >> wrote: > > >> > Hi Karl, > > >> > > > >> > Now 39 aspx files (out of 130) are > indexed. > > Job didn't > > >> get killed. No exceptions in the log. > > >> > > > >> > I increased the maximum POST size of > > solr/jetty but > > >> that 39 number didn't increased. > > >> > > > >> > I will check the size of remaining 130 - > 39 > > *.aspx > > >> files. > > >> > > > >> > Actually I am mapping extracted content > of > > this aspx > > >> files to a ignored dynamic field. > > >> (fmap.content=content_ignored) I don't use > them. I > > am only > > >> interested in metadata of these aspx files. > It > > would be > > >> great if there is a setting to just grab > > metadata. > > >> Similar to Lists. > > >> > > > >> > Thanks, > > >> > Ahmet > > >> > > > >> > --- On Mon, 1/14/13, Karl Wright <[email protected]> > > >> wrote: > > >> > > > >> >> From: Karl Wright <[email protected]> > > >> >> Subject: Re: Repeated service > > interruptions - > > >> failure processing document: null > > >> >> To: [email protected] > > >> >> Date: Monday, January 14, 2013, 5:46 > PM > > >> >> I checked in a fix for this ticket > on > > >> >> trunk. Please let me know if it > > >> >> resolves this issue. > > >> >> > > >> >> Karl > > >> >> > > >> >> On Mon, Jan 14, 2013 at 10:20 AM, > Karl > > Wright > > >> <[email protected]> > > >> >> wrote: > > >> >> > This is because httpclient is > > retrying on > > >> error for > > >> >> three times by > > >> >> > default. This has to be > > disabled in the > > >> Solr > > >> >> connector, or the rest > > >> >> > of the logic won't work right. > > >> >> > > > >> >> > I've opened a ticket > (CONNECTORS-610) > > for this > > >> problem > > >> >> too. > > >> >> > > > >> >> > Karl > > >> >> > > > >> >> > On Mon, Jan 14, 2013 at 10:13 > AM, > > Ahmet Arslan > > >> <[email protected]> > > >> >> wrote: > > >> >> >> Hi Karl, > > >> >> >> > > >> >> >> Thanks for quick fix. > > >> >> >> > > >> >> >> I am still seeing the > following > > error > > >> after 'svn > > >> >> up' and 'ant build' > > >> >> >> > > >> >> >> ERROR 2013-01-14 > 17:09:41,949 > > (Worker > > >> thread '6') - > > >> >> Exception tossed: Repeated service > > interruptions - > > >> failure > > >> >> processing document: null > > >> >> >> > > >> >> > > >> > > > org.apache.manifoldcf.core.interfaces.ManifoldCFException: > > >> >> Repeated service interruptions - > failure > > >> processing > > >> >> document: null > > >> >> >> > > at > > >> >> > > >> > > > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585) > > >> >> >> Caused by: > > >> >> > > org.apache.http.client.ClientProtocolException > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790) > > >> >> >> Caused by: > > >> >> > > >> > > org.apache.http.client.NonRepeatableRequestException: > > >> Cannot > > >> >> retry request with a non-repeatable > > request > > >> entity. > > >> >> The cause lists the reason the > original > > request > > >> failed. > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > > >> >> >> > > ... > > >> 6 more > > >> >> >> Caused by: > > java.net.SocketException: > > >> Broken pipe > > >> >> >> > > at > > >> >> > > java.net.SocketOutputStream.socketWrite0(Native > > >> Method) > > >> >> >> > > at > > >> >> > > >> > > > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > > >> >> >> > > at > > >> >> > > >> > > > java.net.SocketOutputStream.write(SocketOutputStream.java:136) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > > >> >> >> > > at > > >> >> > > >> > > > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718) > > >> >> >> > > ... > > >> 8 more > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> --- On Mon, 1/14/13, Karl > Wright > > <[email protected]> > > >> >> wrote: > > >> >> >> > > >> >> >>> From: Karl Wright <[email protected]> > > >> >> >>> Subject: Re: Repeated > > service > > >> interruptions - > > >> >> failure processing document: null > > >> >> >>> To: [email protected] > > >> >> >>> Date: Monday, January > 14, > > 2013, 3:30 > > >> PM > > >> >> >>> Hi Ahmet, > > >> >> >>> > > >> >> >>> The exception that seems > to > > be causing > > >> the > > >> >> abort is a socket > > >> >> >>> exception > > >> >> >>> coming from a socket > write: > > >> >> >>> > > >> >> >>> > Caused by: > > >> java.net.SocketException: > > >> >> Broken pipe > > >> >> >>> > > >> >> >>> This makes sense in > light of > > the http > > >> code > > >> >> returned from > > >> >> >>> Solr, which > > >> >> >>> was 413: http://www.checkupdown.com/status/E413.html . > > >> >> >>> > > >> >> >>> So there is nothing > actually > > *wrong* > > >> with the > > >> >> .aspx > > >> >> >>> documents, but > > >> >> >>> they are just way too > big, > > and Solr > > >> is > > >> >> rejecting them for > > >> >> >>> that reason. > > >> >> >>> > > >> >> >>> Clearly, though, the > Solr > > connector > > >> should > > >> >> recognize this > > >> >> >>> code as > > >> >> >>> meaning "never retry", > so > > instead of > > >> killing > > >> >> the job, it > > >> >> >>> should just > > >> >> >>> skip the document. > I'll > > open a > > >> ticket for > > >> >> that now. > > >> >> >>> > > >> >> >>> Karl > > >> >> >>> > > >> >> >>> > > >> >> >>> On Mon, Jan 14, 2013 at > 8:22 > > AM, Ahmet > > >> Arslan > > >> >> <[email protected]> > > >> >> >>> wrote: > > >> >> >>> > Hello, > > >> >> >>> > > > >> >> >>> > I am indexing a > > SharePoint 2010 > > >> instance > > >> >> using > > >> >> >>> mcf-trunk (At revision > > 1432907) > > >> >> >>> > > > >> >> >>> > There is no problem > with > > a > > >> Document > > >> >> library that > > >> >> >>> contains word excel > etc. > > >> >> >>> > > > >> >> >>> > However, I receive > the > > following > > >> errors > > >> >> with a Document > > >> >> >>> library that has *.aspx > files > > in it. > > >> >> >>> > > > >> >> >>> > Status of Jobs > => > > Error: > > >> Repeated > > >> >> service > > >> >> >>> interruptions - failure > > processing > > >> document: > > >> >> null > > >> >> >>> > > > >> >> >>> > WARN 2013-01-14 > > >> 15:00:12,720 (Worker > > >> >> thread '13') > > >> >> >>> - Service interruption > > reported for > > >> job > > >> >> 1358009105156 > > >> >> >>> connection 'iknow': IO > > exception > > >> during > > >> >> indexing: null > > >> >> >>> > ERROR 2013-01-14 > > 15:00:12,763 > > >> (Worker > > >> >> thread '13') - > > >> >> >>> Exception tossed: > Repeated > > service > > >> >> interruptions - failure > > >> >> >>> processing document: > null > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.core.interfaces.ManifoldCFException: > > >> >> >>> Repeated service > > interruptions - > > >> failure > > >> >> processing > > >> >> >>> document: null > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585) > > >> >> >>> > Caused by: > > >> >> >>> > > >> > org.apache.http.client.ClientProtocolException > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768) > > >> >> >>> > Caused by: > > >> >> >>> > > >> >> > > >> > > org.apache.http.client.NonRepeatableRequestException: > > >> >> Cannot > > >> >> >>> retry request with a > > non-repeatable > > >> request > > >> >> entity. > > >> >> >>> The cause lists the > reason > > the > > >> original request > > >> >> failed. > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) > > >> >> >>> > > > >> ... > > >> >> 6 more > > >> >> >>> > Caused by: > > >> java.net.SocketException: > > >> >> Broken pipe > > >> >> >>> > > > >> at > > >> >> >>> > > >> > java.net.SocketOutputStream.socketWrite0(Native > > >> >> Method) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > java.net.SocketOutputStream.write(SocketOutputStream.java:136) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718) > > >> >> >>> > > > >> ... > > >> >> 8 more > > >> >> >>> > > > >> >> >>> > Status of Jobs > => > > Error: > > >> Unhandled Solr > > >> >> exception > > >> >> >>> during indexing (0): > Server > > at http://localhost:8983/solr/all returned > non ok > > >> >> >>> status:413, > message:FULL > > head > > >> >> >>> > > > >> >> >>> > > > >> >> ERROR 2013-01-14 > > >> >> >>> 15:10:42,074 (Worker > thread > > '15') - > > >> Exception > > >> >> tossed: > > >> >> >>> Unhandled Solr > exception > > during > > >> indexing (0): > > >> >> Server at http://localhost:8983/solr/all returned > > >> >> non ok > > >> >> >>> status:413, > message:FULL > > head > > >> >> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.core.interfaces.ManifoldCFException: > > >> >> >>> Unhandled Solr > exception > > during > > >> indexing (0): > > >> >> Server at http://localhost:8983/solr/all returned > > >> >> non ok > > >> >> >>> status:413, > message:FULL > > head > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423) > > >> >> >>> > > > >> at > > >> >> >>> > > >> >> > > >> > > > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551) > > >> >> >>> > > > >> >> >>> > On the solr side I > see > > : > > >> >> >>> > > > >> >> >>> > INFO: Creating new > http > > client, > > >> >> >>> > > >> >> > > >> > > config:maxConnections=200&maxConnectionsPerHost=8 > > >> >> >>> > 2013-01-14 > > >> >> > 15:18:21.775:WARN:oejh.HttpParser:Full > > >> >> >>> > > >> >> > > >> > > > [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616 > > >> >> >>> ...long long chars ... > > 2B656B6970{} > > >> >> >>> > > > >> >> >>> > Thanks, > > >> >> >>> > Ahmet > > >> >> >>> > > >> >> > > >> > > > > > >
