[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088223#comment-14088223
 ] 

Steve Rowe commented on SOLR-3881:
----------------------------------

Vitaliy, thanks for the changes.

I see a few more issues in your latest patch:

# {{LangDetectLanguageIdentifierUpdateProcessor.detectLanguage()}} still uses 
{{concatFields()}}, but it shouldn't -- that was the whole point about moving 
it to {{TikaLanguageIdentifierUpdateProcessor}}; instead,  
{{LangDetectLanguageIdentifierUpdateProcessor.detectLanguage()}} should loop 
over {{inputFields}} and call {{detector.append()}} (similarly to what 
{{concatFields()}} does).
# {{concatFields()}} and {{getExpectedSize()}} should move to 
{{TikaLanguageIdentifierUpdateProcessor}}.
# {{LanguageIdentifierUpdateProcessor.getExpectedSize()}} still takes a 
{{maxAppendSize}}, which didn't get renamed, but that param could be removed 
entirely, since {{maxFieldValueChars}} is available as a data member.
# There are a bunch of whitespace changes in 
{{LanguageIdentifierUpdateProcessorFactoryTestCase.java}} - it makes reviewing 
patches significantly harder when they include changes like this.  Your IDE 
should have settings that make it stop doing this.
# There is still some import reordering in 
{{TikaLanguageIdentifierUpdateProcessor.java}}.

One last thing:

{quote}
bq. The total chars default should be its own setting; I was thinking we could 
make it double the per-value default?

\[VZ] added default value to maxTotalChars and changed both to 10K like in 
com.cybozu.labs.langdetect.Detector.maxLength
{quote}

Thanks for adding the total chars default, but you didn't make it double the 
field value chars default, as I suggested.  Not sure if that's better - if the 
user specifies multiple fields and the first one is the only one that's used to 
determine the language because it's larger than the total char default, is that 
an issue?  I was thinking that it would be better to visit at least one other 
field (hence the idea of total = 2 * per-field), but that wouldn't fully 
address the issue.  What do you think?

> frequent OOM in LanguageIdentifierUpdateProcessor
> -------------------------------------------------
>
>                 Key: SOLR-3881
>                 URL: https://issues.apache.org/jira/browse/SOLR-3881
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.0
>         Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=....)
>            Reporter: Rob Tulloh
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-3881.patch, SOLR-3881.patch, SOLR-3881.patch, 
> SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the 
> stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2882)
>         at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>         at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
>         at java.lang.StringBuffer.append(StringBuffer.java:224)
>         at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
>         at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
>         at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
>         at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
>         at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
>         at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
>         at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
>         at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to