[
https://issues.apache.org/jira/browse/TIKA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217218#comment-14217218
]
Sean Zhao commented on TIKA-1482:
---------------------------------
Hello Nick,
Thank you very much for quick response. And here is the stack trace,
{quote}
org.apache.tika.exception.TikaException: Unexpected error in forked server
process
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:158)
at com.tika.TikaForkTest.batchExtractFile(TikaForkTest.java:76)
at com.tika.TikaForkTest.main(TikaForkTest.java:29)
Caused by: java.lang.OutOfMemoryError: Java heap space
at
org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:295)
at
org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:262)
at
org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:132)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:102)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:314)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:262)
at
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:188)
at
org.apache.pdfbox.pdmodel.common.COSStreamArray.getUnfilteredStream(COSStreamArray.java:197)
at
org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:110)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:460)
at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:385)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:344)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:130)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:159)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.tika.fork.ForkServer.call(ForkServer.java:144)
at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124)
at org.apache.tika.fork.ForkServer.main(ForkServer.java:69)
{quote}
Best,
Sean
> ForkParser throws exceptions when process some large pdf files
> --------------------------------------------------------------
>
> Key: TIKA-1482
> URL: https://issues.apache.org/jira/browse/TIKA-1482
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.6
> Environment: Windows 7_x64 / JDK 1.7.0_17
> Reporter: Sean Zhao
> Priority: Critical
> Fix For: 1.6
>
> Attachments: SRCH-13412.pdf
>
>
> In Tika 1.6, ForkParser throws org.apache.tika.exception.TikaException ,
> message:Unexpected error in forked server process, when parsing some large
> pdf files. While tika 1.3 won't.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)