[ 
https://issues.apache.org/jira/browse/SOLR-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177642#comment-13177642
 ] 

Eric Pugh commented on SOLR-2990:
---------------------------------

I have found that Solr CELL is great for small numbers of documents, or quick 
prototyping.  But as you scale up in either # or complexity of documents, it 
becomes a bottle neck.  The Tika CLI is very easy to use, and you can throw 
more resources at doing Tika extraction if you do it outside of Solr and then 
just send the text in, versus doing it inside of Solr.  And it's less risk that 
you bring down Solr!   I wonder if we should put something in the wiki that 
recommends that if you start having problems with Solr CELL, then move to 
running Tika outside, and maybe include some sample code?

Solr Cell is an awesome feature, but it can also cut you!
                
> solr OOM issues
> ---------------
>
>                 Key: SOLR-2990
>                 URL: https://issues.apache.org/jira/browse/SOLR-2990
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 4.0
>         Environment: CentOS 5.x/6.x
> Solr Build apache-solr-4.0-2011-11-04_09-29-42 (includes tika 1.0)
> java -server -Xms2G -Xmx2G -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/var/log/oom/solr.dump.1 -Dsolr.data.dir=/opt/solr.data 
> -Djava.util.logging.config.file=solr-logging.properties -DSTOP.PORT=8907 
> -DSTOP.KEY=STOP -jar start.jar
>            Reporter: Rob Tulloh
>
> We see intermittent issues with OutOfMemory caused by tika failing to process 
> content. Here is an example:
> Dec 29, 2011 7:12:05 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.poi.hmef.attribute.TNEFAttribute.<init>(TNEFAttribute.java:50)
>         at 
> org.apache.poi.hmef.attribute.TNEFAttribute.create(TNEFAttribute.java:76)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:74)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.<init>(HMEFMessage.java:63)
>         at 
> org.apache.tika.parser.microsoft.TNEFParser.parse(TNEFParser.java:79)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
>         at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195)
>         at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1478)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>         at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>         at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>         at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>         at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>         at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>         at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to