[
https://issues.apache.org/jira/browse/TIKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348132#comment-15348132
]
Sergey Beryozkin commented on TIKA-2017:
----------------------------------------
Might also be worth trying multiparts, I've updated the wiki to note that
Metadata, RecursiveMetadata and TikaResource support multipart requests:
https://wiki.apache.org/tika/TikaJAXRS#preview
By the way I recall updating a PDF parser awhile back for it to parse the
metadata only without touching the content, ContentHandler needs to be set to
null. See testPdfParsingMetadataOnly in PDFParserTest. Might make sense to
update other parsers too, though in this case using multiparts alone might
help.
> Tika Server Cannot handle large files
> -------------------------------------
>
> Key: TIKA-2017
> URL: https://issues.apache.org/jira/browse/TIKA-2017
> Project: Tika
> Issue Type: Bug
> Reporter: Harshavardhan Manjunatha
> Fix For: 1.14
>
>
> Tika-Python uses Tika REST Server to parse both content & metadata. In this
> case, the CSV file was 600 MB in size. Tika REST Server runs out of Heap
> Space since it tries to parse Content also. There should an option to make a
> REST API call to Tika Server just to parse & return metadata.
> {code}
> Jun 22, 2016 6:38:40 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: /rmeta/text
> java.lang.RuntimeException: org.apache.cxf.interceptor.Fault: Java heap space
> at
> org.apache.cxf.interceptor.AbstractFaultChainInitiatorObserver.onMessage(AbstractFaultChainInitiatorObserver.java:116)
> at
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:371)
> at
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:370)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
> at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
> at
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
> at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
> at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.cxf.interceptor.Fault: Java heap space
> at
> org.apache.cxf.service.invoker.AbstractInvoker.createFault(AbstractInvoker.java:163)
> at
> org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:129)
> at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:200)
> at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:99)
> at
> org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
> at
> org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
> at
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> ... 21 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)