[
https://issues.apache.org/jira/browse/TIKA-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347382#comment-14347382
]
Tim Allison commented on TIKA-1038:
-----------------------------------
[~tilman], I hadn't been, but now am. Thank you. I'll have a chance to look
into this once we dig out from the impending snow. :)
> Parsing PDF with StackOverlowError
> -----------------------------------
>
> Key: TIKA-1038
> URL: https://issues.apache.org/jira/browse/TIKA-1038
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.2
> Reporter: Konstantin Privezentsev
>
> Tika corrupt with StackOverflowError on some pdf documents:
> http://www.ellipse-labo.com/fiches/1303214351.pdf
> http://downloads.joomlacode.org/frsrelease/5/4/0/54089/handbuch_ckforms-DE-1.3.2.pdf
> Code:
> {code:java}
> AutoDetectParser parser = new AutoDetectParser(
> new TypeDetector(),
> new PDFParser(),
> new OfficeParser(),
> new HtmlParser(),
> new RTFParser(),
> new OOXMLParser());
> WriteOutContentHandler contentHandler = new WriteOutContentHandler();
> Metadata metadata = new Metadata();
> parser.parse(contentStream, new BodyContentHandler(contentHandler), metadata,
> new ParseContext());
> {code}
> Stack trace:
> {code}
> java.lang.StackOverflowError
> at
> java.util.LinkedHashMap$LinkedHashIterator.<init>(LinkedHashMap.java:345)
> at
> java.util.LinkedHashMap$LinkedHashIterator.<init>(LinkedHashMap.java:345)
> at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383)
> at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383)
> at java.util.LinkedHashMap.newKeyIterator(LinkedHashMap.java:396)
> at java.util.HashMap$KeySet.iterator(HashMap.java:874)
> at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1416)
> at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
> at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
> at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
> at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1421)
> ...
> {code}
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)