[
https://issues.apache.org/jira/browse/PDFBOX-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863286#comment-16863286
]
Andreas Lehmkühler commented on PDFBOX-4569:
--------------------------------------------
Maybe some more details about the current status of implementation to get a
more clear picture about possible expectations:
- the parser stops after reading the trailer information -> the time to load is
way much smaller as well as the initial memory foot print
- objects are loaded once they are referenced
- loaded objects won't be freed automatically if they aren't needed any more
- if an object is located in an object stream, all objects are loaded if one of
the objects is needed
- all of this is done automatically and without any changes to the public api
for loading pdfs
Saying that, there is still room for improvements.
[~tilman] I don't expect big impacts on the performance when rendering all
pages of a pdf as usually most of the objects have to be loaded in such cases
[~msahyoun] Depending on the use case there could be huge improvements, if only
parts of those pdfs will be processed. See my former comment about rendering
[[email protected]] Yes, you can use the same api, but you have to build
your own version from the branch
> Implement an ondemand Parser
> ----------------------------
>
> Key: PDFBOX-4569
> URL: https://issues.apache.org/jira/browse/PDFBOX-4569
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Affects Versions: 3.0.0 PDFBox
> Reporter: Andreas Lehmkühler
> Assignee: Andreas Lehmkühler
> Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: PDFBOX-1084.pdf
>
>
> There is a need to replace the big bang parser with an ondemand parser
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]