[
https://issues.apache.org/jira/browse/PDFBOX-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180386#comment-14180386
]
John Hewson commented on PDFBOX-2445:
-------------------------------------
{quote}
John Hewson couldn’t we probably change PDFTextStripper to not use
document.getDocumentCatalog().getAllPages() as I understand that this loads
everything? Or did that change already?
{quote}
Yes, but that API is not in 1.8, though it could be added. Unless the problem
can be reproduced, I wouldn't bother.
> Out of Memory - Extract text for Apache_Solr_4.7_Ref_Guide.pdf
> --------------------------------------------------------------
>
> Key: PDFBOX-2445
> URL: https://issues.apache.org/jira/browse/PDFBOX-2445
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.8.7, 2.0.0
> Reporter: Maruan Sahyoun
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)