[
https://issues.apache.org/jira/browse/PDFBOX-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maruan Sahyoun closed PDFBOX-899.
---------------------------------
Resolution: Fixed
Text extraction works fine since rev 1063402 and the use didn't come back
> OutOfMemoryError with PDFTextStripper
> -------------------------------------
>
> Key: PDFBOX-899
> URL: https://issues.apache.org/jira/browse/PDFBOX-899
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.3.1
> Environment: java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
> Java HotSpot(TM) Client VM (build 17.1-b03, mixed mode)
> Reporter: Alexander Veit
> Priority: Critical
> Attachments: PDFBOX-899.patch
>
>
> PDFBox 1.3.1 has high memory demands when stripping text from PDF files.
> http://www.unicode.org/Public/5.1.0/charts/CodeCharts.pdf even crashes an
> application server by requiring esimated aditional 300MB+ of heap memory. The
> heap dump suggests that PDFStreamEngine#documentFontCache might be the root
> of the leaking objects.
> PDFBox 1.0.0 did not show this behaviour.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira