[
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062122#comment-14062122
]
Steve Rowe commented on SOLR-3881:
----------------------------------
bq. Added string size calculation as string builder capacity. Used to prevent
multiple array allocation on append. (Maybe also need to be configurable - for
large documents only)
[~vzhovtiuk], I agree - I think we should have two configurable limits: max
chars per field value (already in [~tomasflobbe] and your updated patches), and
a max total chars (not there yet).
Tomás wrote:
bq. Do you think it would make more sense to limit each append (for the
different fields) or to limit the total size of the buffer/builder (stop
appending fields when the maximum was reached)? Both ways would prevent OOM,
however they could give different results.
I think we should have *both* limits.
I think it's more important, though, to do as [~rcmuir] said earlier in this
issue:
{quote}
The langdetect implementation can append each piece at a time.
It can also take reader: append(Reader), but that is really just syntactic
sugar forwarding to append(String)
and not exceeding the Detector.max_text_length.
Seems like the concatenating stuff should be pushed out of the base class into
the Tika impl.
{quote}
See
http://language-detection.googlecode.com/svn/trunk/doc/com/cybozu/labs/langdetect/Detector.html#setMaxTextLength(int)
- the default is 10K chars - we can pass the configured max total chars here.
We should also set default maxima for both per-value and total chars, rather
than MAX_INT, as in the current patch.
> frequent OOM in LanguageIdentifierUpdateProcessor
> -------------------------------------------------
>
> Key: SOLR-3881
> URL: https://issues.apache.org/jira/browse/SOLR-3881
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 4.0
> Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=....)
> Reporter: Rob Tulloh
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-3881.patch, SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the
> stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2882)
> at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
> at java.lang.StringBuffer.append(StringBuffer.java:224)
> at
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
> at
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
> at
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
> at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
> at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
> at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]