[ 
https://issues.apache.org/jira/browse/PDFBOX-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737205#comment-13737205
 ] 

Tilman Hausherr commented on PDFBOX-1692:
-----------------------------------------

Same for me with the current 2.0 and loadNonSeq(), convertToImage() and 
-Xmx1024m (until now, I used -Xmx768m )

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit 
exceeded
        at java.nio.charset.CoderResult$1.create(CoderResult.java:224)
        at java.nio.charset.CoderResult$Cache.get(CoderResult.java:213)
        at java.nio.charset.CoderResult$Cache.access$200(CoderResult.java:195)
        at java.nio.charset.CoderResult.malformedForLength(CoderResult.java:234)
        at sun.nio.cs.UnicodeDecoder.decodeLoop(UnicodeDecoder.java:115)
        at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:561)
        at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:160)
        at java.lang.StringCoding.decode(StringCoding.java:193)
        at java.lang.String.<init>(String.java:416)
        at java.lang.String.<init>(String.java:481)
        at 
org.apache.fontbox.cmap.CMapParser.createStringFromBytes(CMapParser.java:618)
        at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:224)
        at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:482)
        at 
org.apache.pdfbox.pdmodel.font.PDSimpleFont.extractToUnicodeEncoding(PDSimpleFont.java:339)
        at 
org.apache.pdfbox.pdmodel.font.PDSimpleFont.determineEncoding(PDSimpleFont.java:307)
        at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:123)
        at 
org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:73)
        at 
org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:62)
        at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
        at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:203)
        at 
org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:580)
        at 
org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:529)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:258)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:225)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205)
        at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:151)
        at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:781)
        at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:712)
        at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:84)
        at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:56)
Java Result: 1
BUILD SUCCESSFUL (total time: 1 minute 53 seconds)


With -Xmx2048m :

run:
test_1fd9a_test.pdf: Total pages: 2, size: 168046 bytes, AVG: 84023 bytes
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit 
exceeded
        at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
        at java.lang.StringCoding.decode(StringCoding.java:193)
        at java.lang.String.<init>(String.java:416)
        at java.lang.String.<init>(String.java:481)
        at 
org.apache.fontbox.cmap.CMapParser.createStringFromBytes(CMapParser.java:618)
        at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:224)
        at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:482)
        at 
org.apache.pdfbox.pdmodel.font.PDSimpleFont.extractToUnicodeEncoding(PDSimpleFont.java:339)
        at 
org.apache.pdfbox.pdmodel.font.PDSimpleFont.determineEncoding(PDSimpleFont.java:307)
        at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:123)
        at 
org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:73)
        at 
org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:62)
        at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
        at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:203)
        at 
org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:580)
        at 
org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:529)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:258)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:225)
        at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:205)
        at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:151)
        at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:781)
        at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:712)
        at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:84)
        at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:56)
Java Result: 1
BUILD SUCCESSFUL (total time: 2 minutes 30 seconds)

                
> java.lang.OutOfMemoryError: Java heap space
> -------------------------------------------
>
>                 Key: PDFBOX-1692
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1692
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.2
>         Environment: Windows 7
> java version 1.7.0_17 (build 1.7.0_17-b02/64-Bit Server VM build 23.7-01)
> pdfbox-app-1.8.2.jar
>            Reporter: Christian Czech
>         Attachments: test_1fd9a_test.pdf
>
>
> Hello,
> I have a problem with text extraction.
> The problem is not enough memory in VM during the text extraction!
> My Code:
> String pdfFile = "D:\testfolder\test1fd9a_test.pdf"; //size of file 168 KB
> PDDocument document = PDDocument.load(pdfFile, true);
> PDFTextStripper stripper = null;
> try {
> stripper = new PDFTextStripper();
> stripper.setSortByPosition(true);
> stripper.writeText(document, outputWriter);
> } catch () {
> }
> You get an error:
> java.lang.OutOfMemoryError: Java heap space 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to