[ https://issues.apache.org/jira/browse/TIKA-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337445#comment-17337445 ]
Julien Massiera commented on TIKA-3372: --------------------------------------- [~tallison] so I tested on a 1.27 build and the limit across embedded resources is working well on an zip archive file. But, with a simple PDF file I created, the content is correctly truncated but there is no "X-TIKA:EXCEPTION:write_limit_reached":"true" metadata to indicate that the content has been truncated. I also see a new exception metadata compared to the 1.26 for this PDF : {code:java} "X-TIKA:EXCEPTION:runtime": "java.io.IOException: Unable to write a string: This is a simple text for testing purpose \n\tat org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:193)\n\tat org.apache.pdfbox.text.PDFTextStripper.writeString(PDFTextStripper.java:785)\n\tat org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1744)\n\tat org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:730)\n\tat org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:395)\n\tat org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:125)\n\tat org.apache.tika.parser.pdf.AbstractPDF2XHTML.processPages(AbstractPDF2XHTML.java:986)\n\tat org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:269)\n\tat org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:96)\n\tat org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:177)\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\n\tat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)\n\tat org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:240)\n\tat org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:451)\n\tat org.apache.tika.server.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:158)\n\tat org.apache.tika.server.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:123)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)\n\tat org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\n\tat org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)\n\tat org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\n\tat org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:279)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.tika.sax.TaggedSAXException\norg.apache.tika.sax.TaggedSAXException\norg.apache.tika.parser.RecursiveParserWrapper$WriteLimitReached\n\tat org.apache.tika.sax.TaggedContentHandler.handleException(TaggedContentHandler.java:113)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:148)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)\n\tat org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:47)\n\tat org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:83)\n\tat org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:141)\n\tat org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:288)\n\tat org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:284)\n\tat org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:311)\n\tat org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:191)\n\t... 50 more\nCaused by: org.apache.tika.sax.TaggedSAXException\norg.apache.tika.parser.RecursiveParserWrapper$WriteLimitReached\n\tat org.apache.tika.sax.TaggedContentHandler.handleException(TaggedContentHandler.java:113)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:148)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)\n\t... 58 more\nCaused by: org.apache.tika.parser.RecursiveParserWrapper$WriteLimitReached\n\tat org.apache.tika.parser.RecursiveParserWrapper$RecursivelySecureContentHandler.characters(RecursiveParserWrapper.java:496)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)\n\tat org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)\n\t... 59 more\n","X-TIKA:EXCEPTION:runtime": "java.io.IOException: Unable to write a string: Fichier de texte servant à tester la recherche dans datafari \n\tat org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:193)\n\tat org.apache.pdfbox.text.PDFTextStripper.writeString(PDFTextStripper.java:785)\n\tat org.apache.pdfbox.text.PDFTextStripper.writeLine(PDFTextStripper.java:1744)\n\tat org.apache.pdfbox.text.PDFTextStripper.writePage(PDFTextStripper.java:730)\n\tat org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:395)\n\tat org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:125)\n\tat org.apache.tika.parser.pdf.AbstractPDF2XHTML.processPages(AbstractPDF2XHTML.java:986)\n\tat org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:269)\n\tat org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:96)\n\tat org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:177)\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)\n\tat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)\n\tat org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:240)\n\tat org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:451)\n\tat org.apache.tika.server.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:158)\n\tat org.apache.tika.server.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:123)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)\n\tat org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\n\tat org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)\n\tat org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\n\tat org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:279)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.tika.sax.TaggedSAXException\norg.apache.tika.sax.TaggedSAXException\norg.apache.tika.parser.RecursiveParserWrapper$WriteLimitReached\n\tat org.apache.tika.sax.TaggedContentHandler.handleException(TaggedContentHandler.java:113)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:148)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)\n\tat org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:47)\n\tat org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:83)\n\tat org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:141)\n\tat org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:288)\n\tat org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:284)\n\tat org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:311)\n\tat org.apache.tika.parser.pdf.PDF2XHTML.writeString(PDF2XHTML.java:191)\n\t... 50 more\nCaused by: org.apache.tika.sax.TaggedSAXException\norg.apache.tika.parser.RecursiveParserWrapper$WriteLimitReached\n\tat org.apache.tika.sax.TaggedContentHandler.handleException(TaggedContentHandler.java:113)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:148)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)\n\t... 58 more\nCaused by: org.apache.tika.parser.RecursiveParserWrapper$WriteLimitReached\n\tat org.apache.tika.parser.RecursiveParserWrapper$RecursivelySecureContentHandler.characters(RecursiveParserWrapper.java:496)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)\n\tat org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)\n\tat org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)\n\t... 59 more\n", "X-TIKA:EXCEPTION:warn": [ "Unable to write a string: This is a simple text for testing purpose ", "Unable to end a paragraph" ] {code} > Fix writelimit in recursiveparserhandler > ---------------------------------------- > > Key: TIKA-3372 > URL: https://issues.apache.org/jira/browse/TIKA-3372 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > On the dev list, [~julienFL] noted surprising behavior with the new write > limit in the /rmeta handler. I wasn't able to replicate it, but there is > clearly a bug in how the write limiting is working. The upshot is that we're > still effectively write limiting per object not for the full container doc > and embedded objects. -- This message was sent by Atlassian Jira (v8.3.4#803005)