Tim Barrett created TIKA-2848:
---------------------------------
Summary: This file consumes an inordinate amount of memory when
parsed by Tika
Key: TIKA-2848
URL: https://issues.apache.org/jira/browse/TIKA-2848
Project: Tika
Issue Type: Bug
Reporter: Tim Barrett
Attachments: Yearbook_1997_r.pdf, Yearbook_2013_s.pdf
When this document is parsed by Tika upwards of 4 Gigs of JVM memory is used.
With 5Gigs allocated all of the memory is used and an an inordinate amount of
time is spent garbage collecting. These are quite old PDFs that were created by
a Canon OCR scanner. This can easily be reproduced by using the CLI
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)