[ 
https://issues.apache.org/jira/browse/PDFBOX-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164383#comment-17164383
 ] 

Alfred commented on PDFBOX-4909:
--------------------------------

The map iterations and computeIfNot present are coming from 
processTextPosition, indeed, if suppressDuplicateOverlappingText is set.

I forgot I did not have that in my original tests, but it is clear now that it 
is not related with the weakhashmap.

 

And the performance with WeakHashMap is even better than with capturing font 
changes.
Depending on memory pressure, it may only compute the height once per font.
Since the tests passed, I vote for your solution too.

> Don't calculate font height for every glyph
> -------------------------------------------
>
>                 Key: PDFBOX-4909
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4909
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>    Affects Versions: 2.0.0, 3.0.0 PDFBox
>            Reporter: Alfred
>            Assignee: Tilman Hausherr
>            Priority: Major
>              Labels: Optimization
>             Fix For: 2.0.21, 3.0.0 PDFBox
>
>         Attachments: PDFBOX-4909.patch, Untitled.png, 
> WithCapturingSetFontAndSize.png
>
>
> LegacyPDFStreamEngine computes font height for every glyph and the 
> computation is rather heavy, to work around all known problems.
> Instead of computing for every glyph, we can recompute only when the font 
> changes. The SetFontAndSize operator will be invoked when the font changes so 
> we can use that to compute and store the height to have it ready when needed 
> in showGlyph.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to