[ https://issues.apache.org/jira/browse/SOLR-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177642#comment-13177642 ]
Eric Pugh commented on SOLR-2990: --------------------------------- I have found that Solr CELL is great for small numbers of documents, or quick prototyping. But as you scale up in either # or complexity of documents, it becomes a bottle neck. The Tika CLI is very easy to use, and you can throw more resources at doing Tika extraction if you do it outside of Solr and then just send the text in, versus doing it inside of Solr. And it's less risk that you bring down Solr! I wonder if we should put something in the wiki that recommends that if you start having problems with Solr CELL, then move to running Tika outside, and maybe include some sample code? Solr Cell is an awesome feature, but it can also cut you! > solr OOM issues > --------------- > > Key: SOLR-2990 > URL: https://issues.apache.org/jira/browse/SOLR-2990 > Project: Solr > Issue Type: Bug > Components: clients - java > Affects Versions: 4.0 > Environment: CentOS 5.x/6.x > Solr Build apache-solr-4.0-2011-11-04_09-29-42 (includes tika 1.0) > java -server -Xms2G -Xmx2G -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/var/log/oom/solr.dump.1 -Dsolr.data.dir=/opt/solr.data > -Djava.util.logging.config.file=solr-logging.properties -DSTOP.PORT=8907 > -DSTOP.KEY=STOP -jar start.jar > Reporter: Rob Tulloh > > We see intermittent issues with OutOfMemory caused by tika failing to process > content. Here is an example: > Dec 29, 2011 7:12:05 AM org.apache.solr.common.SolrException log > SEVERE: java.lang.OutOfMemoryError: Java heap space > at > org.apache.poi.hmef.attribute.TNEFAttribute.<init>(TNEFAttribute.java:50) > at > org.apache.poi.hmef.attribute.TNEFAttribute.create(TNEFAttribute.java:76) > at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:74) > at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98) > at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98) > at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98) > at org.apache.poi.hmef.HMEFMessage.<init>(HMEFMessage.java:63) > at > org.apache.tika.parser.microsoft.TNEFParser.parse(TNEFParser.java:79) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1478) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org