[
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Prates updated PDFBOX-5824:
------------------------------------
Description:
[COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
controls which Map class is used to optimize memory usage. By default, a
SmallMap is used. However, if the number of items in a COSDictionary reaches
the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied
|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
a LinkedHashMap.
For larger documents, where the COSDictionary is expected to be substantial
bigger than this limit, this copying occurs frequently. Additionally,
[SmallMap.keySet is not
efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
The attached screenshot shows pdfbox performance with SmallMap (in red) versus
using LinkedHashMap and ignoring the threshold (in green).
*Would it be beneficial to allow MAP_THRESHOLD to be defined as a System
property?*
If set to 0, LinkedHashMap would be used. If not set, it would default to the
current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.
was:
[COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
controls which Map class is used to optimize memory usage. By default, a
SmallMap is used. However, if the number of items in a COSDictionary reaches
the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied
|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
a LinkedHashMap.
For larger documents, where the COSDictionary is expected to be substantial,
this copying occurs frequently. Additionally, [SmallMap.keySet is not
efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
The attached screenshot shows pdfbox performance with SmallMap (in red) versus
using LinkedHashMap and ignoring the threshold (in green).
*Would it be beneficial to allow MAP_THRESHOLD to be defined as a System
property?*
If set to 0, LinkedHashMap would be used. If not set, it would default to the
current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.
> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> ------------------------------------------------------------------
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 3.0.3 PDFBox, 4.0.0
> Reporter: Jonathan Prates
> Priority: Minor
> Attachments: Screenshot 2024-05-21 at 11.00.25.png
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
> controls which Map class is used to optimize memory usage. By default, a
> SmallMap is used. However, if the number of items in a COSDictionary reaches
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
> a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial
> bigger than this limit, this copying occurs frequently. Additionally,
> [SmallMap.keySet is not
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
> The attached screenshot shows pdfbox performance with SmallMap (in red)
> versus using LinkedHashMap and ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]