[
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088223#comment-14088223
]
Steve Rowe commented on SOLR-3881:
----------------------------------
Vitaliy, thanks for the changes.
I see a few more issues in your latest patch:
# {{LangDetectLanguageIdentifierUpdateProcessor.detectLanguage()}} still uses
{{concatFields()}}, but it shouldn't -- that was the whole point about moving
it to {{TikaLanguageIdentifierUpdateProcessor}}; instead,
{{LangDetectLanguageIdentifierUpdateProcessor.detectLanguage()}} should loop
over {{inputFields}} and call {{detector.append()}} (similarly to what
{{concatFields()}} does).
# {{concatFields()}} and {{getExpectedSize()}} should move to
{{TikaLanguageIdentifierUpdateProcessor}}.
# {{LanguageIdentifierUpdateProcessor.getExpectedSize()}} still takes a
{{maxAppendSize}}, which didn't get renamed, but that param could be removed
entirely, since {{maxFieldValueChars}} is available as a data member.
# There are a bunch of whitespace changes in
{{LanguageIdentifierUpdateProcessorFactoryTestCase.java}} - it makes reviewing
patches significantly harder when they include changes like this. Your IDE
should have settings that make it stop doing this.
# There is still some import reordering in
{{TikaLanguageIdentifierUpdateProcessor.java}}.
One last thing:
{quote}
bq. The total chars default should be its own setting; I was thinking we could
make it double the per-value default?
\[VZ] added default value to maxTotalChars and changed both to 10K like in
com.cybozu.labs.langdetect.Detector.maxLength
{quote}
Thanks for adding the total chars default, but you didn't make it double the
field value chars default, as I suggested. Not sure if that's better - if the
user specifies multiple fields and the first one is the only one that's used to
determine the language because it's larger than the total char default, is that
an issue? I was thinking that it would be better to visit at least one other
field (hence the idea of total = 2 * per-field), but that wouldn't fully
address the issue. What do you think?
> frequent OOM in LanguageIdentifierUpdateProcessor
> -------------------------------------------------
>
> Key: SOLR-3881
> URL: https://issues.apache.org/jira/browse/SOLR-3881
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 4.0
> Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=....)
> Reporter: Rob Tulloh
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-3881.patch, SOLR-3881.patch, SOLR-3881.patch,
> SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the
> stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2882)
> at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
> at java.lang.StringBuffer.append(StringBuffer.java:224)
> at
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
> at
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
> at
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
> at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
> at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
> at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]