[jira] [Commented] (TIKA-2848) This file consumes an inordinate amount of memory when parsed by Tika

Tim Barrett (JIRA) Sun, 07 Apr 2019 09:01:56 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811914#comment-16811914
 ]


Tim Barrett commented on TIKA-2848:
-----------------------------------

Thanks Tim, I’ll try that. Nice to know that Tims are at it on a Sunday! :-)



> This file consumes an inordinate amount of memory when parsed by Tika
> ---------------------------------------------------------------------
>
>                 Key: TIKA-2848
>                 URL: https://issues.apache.org/jira/browse/TIKA-2848
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Barrett
>            Priority: Major
>         Attachments: Yearbook_1997_r.pdf, Yearbook_2013_s.pdf
>
>
> When this document is parsed by Tika upwards of 4 Gigs of JVM memory is used. 
> With 5Gigs allocated all of the memory is used and an an inordinate amount of 
> time is spent garbage collecting. These are quite old PDFs that were created 
> by a Canon OCR scanner. This can easily be reproduced by using the CLI 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (TIKA-2848) This file consumes an inordinate amount of memory when parsed by Tika

Reply via email to