[ https://issues.apache.org/jira/browse/PDFBOX-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr closed PDFBOX-4507. ----------------------------------- Resolution: Not A Bug Closing per my previous argument. You can still comment and/or reopen. > OutOfMemoryError - tika1.19.1.jar > --------------------------------- > > Key: PDFBOX-4507 > URL: https://issues.apache.org/jira/browse/PDFBOX-4507 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 2.0.12, 2.0.14 > Reporter: Ashish Tiwari > Priority: Major > Attachments: testCmplData.pdf > > > I am trying to parse a pdf file and i am getting OOM. > Please find below stacktrace, i was facing similar issue with docx as well, > but that is working now, with changes suggested in attached ticket. > https://issues.apache.org/jira/browse/TIKA-2847 > PS : this issue happens only if i have -Xmx512m configured, if i change it to > 1g it starts working fine. > {code:java} > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit > exceeded > at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57) > at java.nio.CharBuffer.allocate(CharBuffer.java:335) > at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:795) > at org.apache.pdfbox.pdfparser.BaseParser.isValidUTF8(BaseParser.java:782) > at org.apache.pdfbox.pdfparser.BaseParser.parseCOSName(BaseParser.java:762) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:278) > at > org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:212) > at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:862) > at > org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:84) > at org.apache.pdfbox.pdfparser.COSParser.parseObjectStream(COSParser.java:994) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:880) > at > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:794) > at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:754) > at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:185) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:220) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1160) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1133) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:154) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org