[ 
https://issues.apache.org/jira/browse/PDFBOX-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163428#comment-17163428
 ] 

Michael Klink edited comment on PDFBOX-4909 at 7/23/20, 11:07 AM:
------------------------------------------------------------------

{quote}Do you happen to have a PDF that would exhibit that problem?
{quote}
No, I don't, at least not that I'm aware of. I merely stumbled over the problem 
looking at the code, storing a datum based on the current graphics state in a 
text stripper member seemed outright wrong.
{quote}It would be great if the height was saved in the font.
{quote}
On one hand *yes, indeed,* as it really only depends on the font in question. 
On the other hand, though, *no, please not,* as this number is an artificial 
value which is coupled tightly with the text extraction code of the 
{{LegacyPDFStreamEngine}} and {{PDFTextStripper}}, optimized for this usage by 
trial and error, and not necessarily meaningful beyond.

Furthermore, an advantage of the current solution is the option of *overriding* 
the calculation of this value, see [this stack overflow 
answer|https://stackoverflow.com/a/63052240/1729265], an option that indeed can 
make sense and, therefore, should remain.


was (Author: mkl):
{quote}Do you happen to have a PDF that would exhibit that problem?{quote}

No, I don't, at least not that I'm aware of. I merely stumbled over the problem 
looking at the code, storing a datum based on the current graphics state seemed 
outright wrong.

{quote}It would be great if the height was saved in the font.{quote}

On one hand *yes, indeed,* as it really only depends on the font in question. 
On the other hand, though, *no, please not,* as this number is an artificial 
value which is coupled tightly with the text extraction code of the 
{{LegacyPDFStreamEngine}} and {{PDFTextStripper}}, optimized for this usage by 
trial and error, and not necessarily meaningful beyond.

Furthermore, an advantage of the current solution is the option of *overriding* 
the calculation of this value, see [this stack overflow 
answer|https://stackoverflow.com/a/63052240/1729265], an option that indeed can 
make sense and, therefore, should remain.

> Don't calculate font height for every glyph
> -------------------------------------------
>
>                 Key: PDFBOX-4909
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4909
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>    Affects Versions: 2.0.0, 3.0.0 PDFBox
>            Reporter: Alfred
>            Assignee: Tilman Hausherr
>            Priority: Major
>              Labels: Optimization
>             Fix For: 2.0.21, 3.0.0 PDFBox
>
>         Attachments: PDFBOX-4909.patch
>
>
> LegacyPDFStreamEngine computes font height for every glyph and the 
> computation is rather heavy, to work around all known problems.
> Instead of computing for every glyph, we can recompute only when the font 
> changes. The SetFontAndSize operator will be invoked when the font changes so 
> we can use that to compute and store the height to have it ready when needed 
> in showGlyph.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to