[ https://issues.apache.org/jira/browse/PDFBOX-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Doswald updated PDFBOX-3432: ------------------------------------ Attachment: pdfbox-performance-PDFBOX-3432.zip PDFBOX-3432_Optimize_CID_to_GlyphId_mapping_rev1.patch This is my proposed implementation of the IntIntMap class. The patch also replaces the Map<Integer,Integer> instance variable from CmapSubtable. The attached JMH benchmark simply parses the DejaVuSans.ttf font with the TTFParser. With the simple changes to the CmapSubtable done so far, I've got the following performance numbers: Desktop OLD: PdfBoxBenchmark.leadTTFFont avgt 6.326 ± 0.119 ms/op NEW: PdfBoxBenchmark.leadTTFFont avgt 5.849 ± 0.156 ms/op Embedded (i.MX6DL) OLD: PdfBoxBenchmark.leadTTFFont avgt 65.112 ± 1.368 ms/op NEW: PdfBoxBenchmark.leadTTFFont avgt 54.661 ± 2.402 ms/op Since the code does no longer use autoboxing/unboxing, the allocation rate also dropped (measurements from my desktop): OLD: PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate avgt 771.634 ± 18.420 MB/sec PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate.norm avgt 5109556.121 ± 1020.975 B/op NEW: PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate avgt 506.081 ± 17.222 MB/sec PdfBoxBenchmark.leadTTFFont:·gc.alloc.rate.norm avgt 3117169.547 ± 7449.283 B/op The potential for more optimizations of this kind is not fully exploited with this patch. Some more areas that I could investigate (by just skimming the code): * CmapSubtable.getCharacterCode also returns a boxed Integer. This seems to be used in PDCIDFontType2Embedder only and could also be done with a primitive int? * PDCIDFontType2Embedder buildSubset also uses Map<Integer,Integer> * There are a lot of map objects that map a Integer to an object. Implementing a special mapping class for int to Object mappings (analog to IntIntMap) may help here too I'd be happy to hear your opinion on this patch and whether I should investigate further. Also: Is there a set of different fonts available to properly test all the processSubtypeX methods in CmapSubtable? I currently work with DejaVu and the test code in fontbox works with LiberationSans, I'm not sure if this tests all the cases. > Optimize CID to GlyphId mapping (TTF) > ------------------------------------- > > Key: PDFBOX-3432 > URL: https://issues.apache.org/jira/browse/PDFBOX-3432 > Project: PDFBox > Issue Type: Improvement > Components: FontBox > Affects Versions: 2.0.2 > Environment: Ubuntu 14.04.4 LTS > Reporter: Michael Doswald > Priority: Trivial > Labels: optimization, performance > Attachments: PDFBOX-3432_Optimize_CID_to_GlyphId_mapping_rev1.patch, > pdfbox-performance-PDFBOX-3432.zip > > > TTF fonts map code-points (Code IDs) to glyphs. These are mappings from int > to int. Because the JDK lacks map classes for primitive types, the code (e.g. > in CmapSubtable) currently uses Map<Integer,Integer> for those mappings. This > is inefficient in different ways: > * Autoboxing/unboxing introduces a performance penalty > * Boxing to Integer objects has a memory overhead > * The JDK Map implementation has a big memory overhead for such simple objects > For efficiency (execution time and memory consumption) I would propose to > introduce a simple IntIntMap implementation which works with primitive > integers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org