[
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vitaliy Zhovtyuk updated SOLR-3881:
-----------------------------------
Attachment: SOLR-3881.patch
1. LangDetectLanguageIdentifierUpdateProcessor.detectLanguage() still uses
concatFields(), but it shouldn't – that was the whole point about moving it to
TikaLanguageIdentifierUpdateProcessor; instead,
LangDetectLanguageIdentifierUpdateProcessor.detectLanguage() should loop over
inputFields and call detector.append() (similarly to what concatFields() does).
[VZ] LangDetectLanguageIdentifierUpdateProcessor.detectLanguage() changed to
use old flow with limit on field and max total on detector.
Each field value appended to detector.
2. concatFields() and getExpectedSize() should move to
TikaLanguageIdentifierUpdateProcessor.
[VZ] Moved to TikaLanguageIdentifierUpdateProcessor. Tests using concatFields()
moved to TikaLanguageIdentifierUpdateProcessorFactoryTest.
3. LanguageIdentifierUpdateProcessor.getExpectedSize() still takes a
maxAppendSize, which didn't get renamed, but that param could be removed
entirely, since maxFieldValueChars is available as a data member.
[VZ] Argument removed.
4. There are a bunch of whitespace changes in
LanguageIdentifierUpdateProcessorFactoryTestCase.java - it makes reviewing
patches significantly harder when they include changes like this. Your IDE
should have settings that make it stop doing this.
[VZ] Whitespaces removed.
5. There is still some import reordering in
TikaLanguageIdentifierUpdateProcessor.java.
[VZ] Fixed.
One last thing:
The total chars default should be its own setting; I was thinking we could make
it double the per-value default?
[VZ] added default value to maxTotalChars and changed both to 10K like in
com.cybozu.labs.langdetect.Detector.maxLength
Thanks for adding the total chars default, but you didn't make it double the
field value chars default, as I suggested. Not sure if that's better - if the
user specifies multiple fields and the first one is the only one that's used to
determine the language because it's larger than the total char default, is that
an issue? I was thinking that it would be better to visit at least one other
field (hence the idea of total = 2 * per-field), but that wouldn't fully
address the issue. What do you think?
[VZ] i think in most cases it will be only one field, but since both parameters
are optional we should not restrict result if only per field specified more
then 10K.
Updated total default value to 20K.
> frequent OOM in LanguageIdentifierUpdateProcessor
> -------------------------------------------------
>
> Key: SOLR-3881
> URL: https://issues.apache.org/jira/browse/SOLR-3881
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 4.0
> Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=....)
> Reporter: Rob Tulloh
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-3881.patch, SOLR-3881.patch, SOLR-3881.patch,
> SOLR-3881.patch, SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the
> stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2882)
> at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
> at java.lang.StringBuffer.append(StringBuffer.java:224)
> at
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
> at
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
> at
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
> at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
> at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
> at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
> at
> org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]