[jira] [Updated] (PDFBOX-3418) Slow string to hex conversion in ToUnicodeWriter

Michael Doswald (JIRA) Tue, 12 Jul 2016 09:07:49 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Doswald updated PDFBOX-3418:
------------------------------------
    Attachment: PDFBOX-3418_ToUnicodeWriter_performance_rev1.patch
                PDFBOX-3418_PerformanceTest.zip

Added a proposed patch to speed up the performance of ToUnicodeWriter. Also, a 
JMH benchmark project is included.

My test results show that the changes have just a small impact on desktop 
computers, but a significant impact on my embedded system (i.MX6 DL):

Desktop:
OLD: PdfBoxBenchmark.loadEmbeddedFont  avgt   10  76.644 ± 1.295  ms/op
NEW: PdfBoxBenchmark.loadEmbeddedFont  avgt   10  60.510 ± 1.265  ms/op

Embedded:
OLD: PdfBoxBenchmark.loadEmbeddedFont  avgt   10  1075.366 ? 32.550  ms/op
NEW: PdfBoxBenchmark.loadEmbeddedFont  avgt   10  665.002 ? 31.051  ms/op

Also, the allocation rate has decreased significantly. Below are the 
measurements on my desktop system.

OLD:
PdfBoxBenchmark.loadEmbeddedFont:·gc.alloc.rate                    avgt   10    
   542.965 ±        9.165  MB/sec
PdfBoxBenchmark.loadEmbeddedFont:·gc.alloc.rate.norm               avgt   10  
43351852.752 ±      437.214    B/op

NEW:
PdfBoxBenchmark.loadEmbeddedFont:·gc.alloc.rate                    avgt   10    
   273.424 ±        9.800  MB/sec
PdfBoxBenchmark.loadEmbeddedFont:·gc.alloc.rate.norm               avgt   10  
17341439.557 ±      474.988    B/op


> Slow string to hex conversion in ToUnicodeWriter
> ------------------------------------------------
>
>                 Key: PDFBOX-3418
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3418
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel
>    Affects Versions: 2.0.2
>         Environment: Ubuntu 14.04 LTS
>            Reporter: Michael Doswald
>            Priority: Trivial
>              Labels: optimization, performance
>         Attachments: PDFBOX-3418_PerformanceTest.zip, 
> PDFBOX-3418_ToUnicodeWriter_performance_rev1.patch
>
>
> The ToUnicodeWriter.writeTo(OutputStream) method converts a lot of shorts and 
> strings to hexadecimal strings. This is done with String.format and therefore 
> not very efficient. 
> The ToUnicodeWriter.toHex(int) and ToUnitcodeWriter.stringToHex(String) 
> methods could be rewritten to generate a char-array and being generally more 
> efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (PDFBOX-3418) Slow string to hex conversion in ToUnicodeWriter

Reply via email to