[
https://issues.apache.org/jira/browse/PDFBOX-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17602725#comment-17602725
]
Andreas Lehmkühler commented on PDFBOX-5499:
--------------------------------------------
{quote}
For my document, the biggest SmallMap contains 40923 entries.
{quote}
Wow, that's a lot, sounds like a very rare corner case. What kind of dictionary
is that? Look for the COSName.TYPE entry. Where does this pdf come from? Did
you or your customer create it or is it just some pdf out of the wild?
> Performance issue since 2.0.18
> ------------------------------
>
> Key: PDFBOX-5499
> URL: https://issues.apache.org/jira/browse/PDFBOX-5499
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 2.0.19
> Reporter: Thomas Debray Luyat
> Priority: Major
> Attachments: image-2022-09-05-12-48-04-608.png,
> image-2022-09-05-17-37-55-155.png, image-2022-09-05-17-40-22-416.png,
> image-2022-09-05-19-55-40-753.png
>
>
> Our PDF is parsed in less than 200ms in 2.0.18 and more then 8 seconds in
> 2.0.19. The same issue is still there in 2.0.26.
>
> In version 2.0.19, SmallMap has been introduced. We're facing a performance
> issue since this modification.
> !image-2022-09-05-12-48-04-608.png|width=968,height=377!
> We patch our code to just replace the SmallMap implementation like this:
> {code:java}
> package org.apache.pdfbox.util;
> import java.util.LinkedHashMap;
> public class SmallMap<K, V> extends LinkedHashMap<K, V> {
> // nothing : use the standard LinkedHashMap
> }{code}
> And the performance issue disappear.
> Our test is really simple:
> {code:java}
> long start = System.currentTimeMillis();
> try (PDDocument document = PDDocument.load(new File(inFile))) {
> // nothing : only parsing is evaluated
> }
> long duration = System.currentTimeMillis() -start;
> assertTrue(duration < 500);{code}
>
> I can understand that the SmallMap can solve issues in some cases, but it is
> possible to implement a factory to create this map and then allow to setup
> which Map implementation we want to use?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]