[
https://issues.apache.org/jira/browse/PDFBOX-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755516#comment-13755516
]
Tilman Hausherr commented on PDFBOX-1692:
-----------------------------------------
Yes, thanks, it works after the change. I can't remember if I really saw an EOF
or if it was an assumption because of the -1. I also tested whether the
endcodes are ever -1 when "if (Arrays.equals(startCode, tokenBytes))" wasn't
true, and it never happened with any of my test files. So you can close this :-)
> java.lang.OutOfMemoryError: Java heap space
> -------------------------------------------
>
> Key: PDFBOX-1692
> URL: https://issues.apache.org/jira/browse/PDFBOX-1692
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.2
> Environment: Windows 7
> java version 1.7.0_17 (build 1.7.0_17-b02/64-Bit Server VM build 23.7-01)
> pdfbox-app-1.8.2.jar
> Reporter: Christian Czech
> Attachments: Errors_when_buidling_pdfbox.jpg.png, PDFBOX-1692.patch,
> test_1fd9a_test-01.png, test_1fd9a_test-02.png, test_1fd9a_test.pdf
>
>
> Hello,
> I have a problem with text extraction.
> The problem is not enough memory in VM during the text extraction!
> My Code:
> String pdfFile = "D:\testfolder\test1fd9a_test.pdf"; //size of file 168 KB
> PDDocument document = PDDocument.load(pdfFile, true);
> PDFTextStripper stripper = null;
> try {
> stripper = new PDFTextStripper();
> stripper.setSortByPosition(true);
> stripper.writeText(document, outputWriter);
> } catch () {
> }
> You get an error:
> java.lang.OutOfMemoryError: Java heap space
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira