[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062122#comment-14062122
 ] 

Steve Rowe commented on SOLR-3881:
----------------------------------

bq. Added string size calculation as string builder capacity. Used to prevent 
multiple array allocation on append. (Maybe also need to be configurable - for 
large documents only)

[~vzhovtiuk], I agree - I think we should have two configurable limits: max 
chars per field value (already in [~tomasflobbe] and your updated patches), and 
a max total chars (not there yet).

Tomás wrote:
bq. Do you think it would make more sense to limit each append (for the 
different fields) or to limit the total size of the buffer/builder (stop 
appending fields when the maximum was reached)? Both ways would prevent OOM, 
however they could give different results.

I think we should have *both* limits.

I think it's more important, though, to do as [~rcmuir] said earlier in this 
issue: 

{quote}
The langdetect implementation can append each piece at a time.

It can also take reader: append(Reader), but that is really just syntactic 
sugar forwarding to append(String)
and not exceeding the Detector.max_text_length.

Seems like the concatenating stuff should be pushed out of the base class into 
the Tika impl.
{quote}

See 
http://language-detection.googlecode.com/svn/trunk/doc/com/cybozu/labs/langdetect/Detector.html#setMaxTextLength(int)
 - the default is 10K chars - we can pass the configured max total chars here.

We should also set default maxima for both per-value and total chars, rather 
than MAX_INT, as in the current patch.

> frequent OOM in LanguageIdentifierUpdateProcessor
> -------------------------------------------------
>
>                 Key: SOLR-3881
>                 URL: https://issues.apache.org/jira/browse/SOLR-3881
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.0
>         Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=....)
>            Reporter: Rob Tulloh
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-3881.patch, SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the 
> stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2882)
>         at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>         at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
>         at java.lang.StringBuffer.append(StringBuffer.java:224)
>         at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
>         at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
>         at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
>         at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
>         at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to