[
https://issues.apache.org/jira/browse/SOLR-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177642#comment-13177642
]
Eric Pugh commented on SOLR-2990:
---------------------------------
I have found that Solr CELL is great for small numbers of documents, or quick
prototyping. But as you scale up in either # or complexity of documents, it
becomes a bottle neck. The Tika CLI is very easy to use, and you can throw
more resources at doing Tika extraction if you do it outside of Solr and then
just send the text in, versus doing it inside of Solr. And it's less risk that
you bring down Solr! I wonder if we should put something in the wiki that
recommends that if you start having problems with Solr CELL, then move to
running Tika outside, and maybe include some sample code?
Solr Cell is an awesome feature, but it can also cut you!
> solr OOM issues
> ---------------
>
> Key: SOLR-2990
> URL: https://issues.apache.org/jira/browse/SOLR-2990
> Project: Solr
> Issue Type: Bug
> Components: clients - java
> Affects Versions: 4.0
> Environment: CentOS 5.x/6.x
> Solr Build apache-solr-4.0-2011-11-04_09-29-42 (includes tika 1.0)
> java -server -Xms2G -Xmx2G -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/var/log/oom/solr.dump.1 -Dsolr.data.dir=/opt/solr.data
> -Djava.util.logging.config.file=solr-logging.properties -DSTOP.PORT=8907
> -DSTOP.KEY=STOP -jar start.jar
> Reporter: Rob Tulloh
>
> We see intermittent issues with OutOfMemory caused by tika failing to process
> content. Here is an example:
> Dec 29, 2011 7:12:05 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.poi.hmef.attribute.TNEFAttribute.<init>(TNEFAttribute.java:50)
> at
> org.apache.poi.hmef.attribute.TNEFAttribute.create(TNEFAttribute.java:76)
> at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:74)
> at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
> at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
> at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
> at org.apache.poi.hmef.HMEFMessage.<init>(HMEFMessage.java:63)
> at
> org.apache.tika.parser.microsoft.TNEFParser.parse(TNEFParser.java:79)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1478)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]