Tim Barrett created TIKA-2848:
---------------------------------

             Summary: This file consumes an inordinate amount of memory when 
parsed by Tika
                 Key: TIKA-2848
                 URL: https://issues.apache.org/jira/browse/TIKA-2848
             Project: Tika
          Issue Type: Bug
            Reporter: Tim Barrett
         Attachments: Yearbook_1997_r.pdf, Yearbook_2013_s.pdf

When this document is parsed by Tika upwards of 4 Gigs of JVM memory is used. 
With 5Gigs allocated all of the memory is used and an an inordinate amount of 
time is spent garbage collecting. These are quite old PDFs that were created by 
a Canon OCR scanner. This can easily be reproduced by using the CLI 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to