[ 
https://issues.apache.org/jira/browse/PDFBOX-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025332#comment-17025332
 ] 

Andreas Lehmkühler commented on PDFBOX-4569:
--------------------------------------------

The hardest part was the reintegration of the branch in the trunk ;-)

I stumbled upon the overwrite issue when loading a malformed pdf. The parser 
itself modifies some of the pages  to repair the number of pages. I didn't 
change anything at all. BTW, If we reread all objects instead of caching them, 
those reread objects aren't the same as the first read ones but should be 
equal. I stumbled upon some code using "==" instead of "equals" and it didn't 
work to simply use "equals" as it broke some other cases, seems related to 
PDFBOX-4723.
IMHO we need to analyze the situation first to be able to come up with a 
possible solution. 

I'd like to follow up with the idea of using memory mapped files especially as 
[~torakiki] posted a very promising hint on dev@

However, we should discuss this on dev@ or create new tickets


> Implement an ondemand Parser
> ----------------------------
>
>                 Key: PDFBOX-4569
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4569
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Andreas Lehmkühler
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: PDFBOX-1084.pdf
>
>
> There is a need to replace the big bang parser with an ondemand parser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to