James Hardwick created TIKA-1462:
------------------------------------
Summary: PDFont consumes all heap space
Key: TIKA-1462
URL: https://issues.apache.org/jira/browse/TIKA-1462
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.6
Reporter: James Hardwick
Priority: Critical
See https://issues.apache.org/jira/browse/PDFBOX-2200 for more details.
In short, PDFont will not release resources, and will eventually amass enough
objects to consume all available memory. We are encountering this in
productions environments, causing our solr server to crash when ingesting large
amounts of PDF documents.
The fix is supposedly in for the 2.0.0 release of PDFBox, but that version has
been outstanding for so long that I'd suggest implementing the workaround as
proposed in the PDFBox issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)