[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitaliy Zhovtyuk updated SOLR-3881:
-----------------------------------

    Attachment: SOLR-3881.patch

About moving concatFields() to the tika language identifier: I think the way to 
go is just move the whole method there, then change the detectLanguage() method 
to take the SolrInputDocument instead of a String. You don't need to carry over 
the field[] parameter from concatFields(), since data member inputFields will 
be accessible everywhere it's needed.
[VZ] This call looks more cleaner now, i changed inputFields to private now to 
reduce visibility scope

I should have mentioned previously: I don't like the maxAppendSize and 
maxTotalAppendSize names - "size" is ambiguous (could refer to bytes, chars, 
whatever), and "append" refers to an internal operation... I'd like to see 
"append"=>"field value" and "size"=>"chars": maxFieldValueChars, and 
maxTotalChars (since appending doesn't need to be mentioned for the global 
limit). The same thing goes for the default constants and the test method names.
[VZ] Renamed parameters and test methods

Some minor issues I found with your patch:
As I said previously: "We should also set default maxima for both per-value and 
total chars, rather than MAX_INT, as in the current patch."
The total chars default should be its own setting; I was thinking we could make 
it double the per-value default?
[VZ] added default value to maxTotalChars and changed both to 10K like in 
com.cybozu.labs.langdetect.Detector.maxLength

It's better not to reorder import statements unless you're already making 
significant changes to them; it distracts from the meat of the change. (You 
reordered them in LangDetectLanguageIdentifierUpdateProcessor and 
LanguageIdentifierUpdateProcessorFactoryTestCase)
[VZ] This is IDE optimization to put imports in alphabetical order - restored 
it to original order

In LanguageIdentifierUpdateProcessor.concatFields(), when you trim the 
concatenated text to maxTotalAppendSize, I think 
StringBuilder.setLength(maxTotalAppendSize); would be more efficient than 
StringBuilder.delete(maxTotalAppendSize, sb.length() - 1);
[VZ] Yep, cleaned that 

In addition to the test you added for the global limit, we should also test 
using both the per-value and global limits at the same time.
[VZ] Tests for both limits added

> frequent OOM in LanguageIdentifierUpdateProcessor
> -------------------------------------------------
>
>                 Key: SOLR-3881
>                 URL: https://issues.apache.org/jira/browse/SOLR-3881
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.0
>         Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=....)
>            Reporter: Rob Tulloh
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-3881.patch, SOLR-3881.patch, SOLR-3881.patch, 
> SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the 
> stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2882)
>         at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>         at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
>         at java.lang.StringBuffer.append(StringBuffer.java:224)
>         at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
>         at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
>         at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
>         at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
>         at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to