[
https://issues.apache.org/jira/browse/PDFBOX-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865813#comment-16865813
]
Andreas Lehmkühler commented on PDFBOX-4569:
--------------------------------------------
[~msahyoun] regarding caching/optimized memory consumption:
I'd like to do some refactoring first. Nevertheless I've some (similar) ideas
on how to optimize the parsing itself:
- introduce a switch to deactivate the caching of indirect referenced objects
- caching dependent on the kind of objects
- find a way to connect objects to their page so that one could remove such
objects if a page was processed
- use weak references for cached objects so that the GC might remove those
objects if needed
- use a memory mapped file to optimize the file handling
Some of them are easier and others are harder to implement. I'd start with the
easier one and see if it's worth to think about the harder ones
> Implement an ondemand Parser
> ----------------------------
>
> Key: PDFBOX-4569
> URL: https://issues.apache.org/jira/browse/PDFBOX-4569
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Affects Versions: 3.0.0 PDFBox
> Reporter: Andreas Lehmkühler
> Assignee: Andreas Lehmkühler
> Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: PDFBOX-1084.pdf
>
>
> There is a need to replace the big bang parser with an ondemand parser
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]