[
https://issues.apache.org/jira/browse/SOLR-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844855#comment-13844855
]
Chris commented on SOLR-5440:
-----------------------------
Been busy with an international relocation, however back up and running and got
back to this.
The infinite loop is hit within the loop on line 4305. I have found the
offending text, and it is not email addresses, but rather the source code of an
html page which has been URLEncoded. Should be relatively easy to reproduce
(url encode this pages source code for example). If you need the exact text I
am using, I can provide it privately.
As a stop gap, since this text would never be searched, I'm detecting it and
not pushing it up to solr.
To answer above Q's, im running on Linux, JVM version 7 update 25, docs range
in size from 10KB to 4MB, and not passing any flags to the tokenizer.
> UAX29URLEmailTokenizer thread hangs on getNextToken - causes cloud to stop
> accepting updates
> --------------------------------------------------------------------------------------------
>
> Key: SOLR-5440
> URL: https://issues.apache.org/jira/browse/SOLR-5440
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.5
> Reporter: Chris
>
> This is a pretty nasty bug, and causes the cluster to stop accepting updates.
> I'm not sure how to consistently reproduce it but I have done so numerous
> times. Switching to a whitespace tokenizer improved indexing speed, and I
> never got the issue again.
> I'm running a 4.6 Snapshot - I had issues with deadlocks with numerous
> versions of Solr, and have finally narrowed down the problem to this code,
> which affects many/all(?) versions of Solr.
> When the thread hits this issue it uses 100% CPU, restarting the node which
> has the error allows indexing to continue until hit again. Here is thread
> dump:
> http-bio-8080-exec-45 (201)
>
> org.apache.lucene.analysis.standard.UAX29URLEmailTokenizerImpl.getNextToken(UAX29URLEmailTokenizerImpl.java:4343)
>
> org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer.incrementToken(UAX29URLEmailTokenizer.java:147)
>
> org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:82)
>
> org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
>
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
>
> org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
>
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
>
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453)
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1517)
>
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:217)
>
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
>
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:583)
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:719)
>
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:449)
>
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:89)
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:151)
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
>
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
>
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
>
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> java.lang.Thread.run(Unknown Source)
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]