[ 
https://issues.apache.org/jira/browse/PDFBOX-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Doswald updated PDFBOX-3432:
------------------------------------
    Attachment: pdfbox-performance-PDFBOX-3432.zip
                PDFBOX-3432_Optimize_CID_to_GlyphId_mapping_rev1.patch

This is my proposed implementation of the IntIntMap class. The patch also 
replaces the Map<Integer,Integer> instance variable from CmapSubtable.

The attached JMH benchmark simply parses the DejaVuSans.ttf font with the 
TTFParser. With the simple changes to the CmapSubtable done so far, I've got 
the following performance numbers:

Desktop
OLD: PdfBoxBenchmark.leadTTFFont  avgt   6.326 ± 0.119  ms/op
NEW: PdfBoxBenchmark.leadTTFFont  avgt   5.849 ± 0.156  ms/op

Embedded (i.MX6DL)
OLD: PdfBoxBenchmark.leadTTFFont  avgt  65.112 ± 1.368  ms/op
NEW: PdfBoxBenchmark.leadTTFFont  avgt  54.661 ± 2.402  ms/op

Since the code does no longer use autoboxing/unboxing, the allocation rate also 
dropped (measurements from my desktop):

OLD:
PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate   avgt  771.634 ±      18.420  MB/sec
PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate.norm  avgt  5109556.121 ±    
1020.975    B/op

NEW: 
PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate    avgt   506.081 ±     17.222  
MB/sec
PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate.norm    avgt  3117169.547 ±   
7449.283    B/op

The potential for more optimizations of this kind is not fully exploited with 
this patch. Some more areas that I could investigate (by just skimming the 
code):

* CmapSubtable.getCharacterCode also returns a boxed Integer. This seems to be 
used in PDCIDFontType2Embedder only and could also be done with a primitive int?
* PDCIDFontType2Embedder buildSubset also uses Map<Integer,Integer>
* There are a lot of map objects that map a Integer to an object. Implementing 
a special mapping class for int to Object mappings (analog to IntIntMap) may 
help here too

I'd be happy to hear your opinion on this patch and whether I should 
investigate further. 

Also: Is there a set of different fonts available to properly test all the 
processSubtypeX methods in CmapSubtable? I currently work with DejaVu and the 
test code in fontbox works with LiberationSans, I'm not sure if this tests all 
the cases.


> Optimize CID to GlyphId mapping (TTF)
> -------------------------------------
>
>                 Key: PDFBOX-3432
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3432
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: FontBox
>    Affects Versions: 2.0.2
>         Environment: Ubuntu 14.04.4 LTS
>            Reporter: Michael Doswald
>            Priority: Trivial
>              Labels: optimization, performance
>         Attachments: PDFBOX-3432_Optimize_CID_to_GlyphId_mapping_rev1.patch, 
> pdfbox-performance-PDFBOX-3432.zip
>
>
> TTF fonts map code-points (Code IDs) to glyphs. These are mappings from int 
> to int. Because the JDK lacks map classes for primitive types, the code (e.g. 
> in CmapSubtable) currently uses Map<Integer,Integer> for those mappings. This 
> is inefficient in different ways:
> * Autoboxing/unboxing introduces a performance penalty
> * Boxing to Integer objects has a memory overhead
> * The JDK Map implementation has a big memory overhead for such simple objects
> For efficiency (execution time and memory consumption) I would propose to 
> introduce a simple IntIntMap implementation which works with primitive 
> integers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to