[ 
https://issues.apache.org/jira/browse/TIKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348132#comment-15348132
 ] 

Sergey Beryozkin commented on TIKA-2017:
----------------------------------------

Might also be worth trying multiparts, I've updated the wiki to note that 
Metadata, RecursiveMetadata and TikaResource support multipart requests:
https://wiki.apache.org/tika/TikaJAXRS#preview

By the way I recall updating a PDF parser awhile back for it to parse the 
metadata only without touching the content, ContentHandler needs to be set to 
null. See testPdfParsingMetadataOnly in PDFParserTest. Might make sense to 
update other parsers too, though in this case using multiparts alone might 
help. 



 

> Tika Server Cannot handle large files
> -------------------------------------
>
>                 Key: TIKA-2017
>                 URL: https://issues.apache.org/jira/browse/TIKA-2017
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Harshavardhan Manjunatha
>             Fix For: 1.14
>
>
> Tika-Python uses Tika REST Server to parse both content & metadata. In this 
> case, the CSV file was 600 MB in size. Tika REST Server runs out of Heap 
> Space since it tries to parse Content also. There should an option to make a 
> REST API call to Tika Server just to parse & return metadata.
> {code}
> Jun 22, 2016 6:38:40 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: /rmeta/text
> java.lang.RuntimeException: org.apache.cxf.interceptor.Fault: Java heap space
>         at 
> org.apache.cxf.interceptor.AbstractFaultChainInitiatorObserver.onMessage(AbstractFaultChainInitiatorObserver.java:116)
>         at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:371)
>         at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
>         at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
>         at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
>         at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>         at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>         at org.eclipse.jetty.server.Server.handle(Server.java:370)
>         at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
>         at 
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)
>         at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)
>         at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
>         at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>         at 
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
>         at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
>         at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.cxf.interceptor.Fault: Java heap space
>         at 
> org.apache.cxf.service.invoker.AbstractInvoker.createFault(AbstractInvoker.java:163)
>         at 
> org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:129)
>         at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:200)
>         at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:99)
>         at 
> org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
>         at 
> org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
>         at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>         ... 21 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to