[
https://issues.apache.org/jira/browse/PDFBOX-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-3405:
------------------------------------
Description:
I (along with others) have found using the font size of text to be very useful
when doing things like trying to recover the structure of PDFs. For example, in
heuristics like 'text with large font sizes are probably titles'. However, I
noticed a few cases where {{getFontSizeInPt}} or {{getFontSize}} return
seemingly very inaccurate results. For example, in the attached pdf the
{{getFontSizeInPt}} for the title text is over 500.
After digging into this a little, as I understand it neither of these methods
return the a font size scaled to the display space. {{getFontSize}} returns the
"raw" encoded font size and {{getFontSizeInPt}} returns the font size scaled by
the text matrix, but not by the current transformation matrix.
Basically, in order to get reliable font information, it would be helpful if
either
1) {{getFontSizeInPt}} includes the effect of using the current transformation
matrix
2) A new method like {{getDisplayFontSize}} is added that returns the font size
scaled to the display space
As a side note, I have seen several users (including myself), assume that
{{getFontSize}} returns the font size as would be observed when one opens the
PDF, and the been confused when these method occasionally do not return the
results expected. I think {{getFontSize}} would benefit from a clear note that
the results might not include scaling factors that were used when the text was
rendered.
was:
I (along with others) have found using the font size of text to be very useful
when doing things like trying to recover the structure of PDFs. For example, in
heuristics like 'text with large font sizes are probably titles'. However, I
noticed a few cases where getFontSizePt or getFontSize return seemingly very
inaccurate results. For example, in the attached pdf the getFontSizePt for the
title text is over 500.
After digging into this a little, as I understand it neither of these methods
return the a font size scaled to the display space. getFontSize returns the
"raw" encoded font size and getFontSizePt returns the font size scaled by the
text matrix, but not by the current transformation matrix.
Basically, in order to get reliable font information, it would be helpful if
either
1) getFontSizePt includes the affect of using current transformation matrix
2) A new method like "getDisplayFontSize" is added that returns the font sizes
scaled to the display space
As a side note, I have seen several users (including myself), assume that
"getFontSize" returns the font size as would be observed when one opens the
PDF, and the been confused when these method occasionally do not return the
results expected. I think "getFontSize" would benefit from a clear note that
the results might not include scaling factors that were used when the text was
rendered.
> Display font size
> -----------------
>
> Key: PDFBOX-3405
> URL: https://issues.apache.org/jira/browse/PDFBOX-3405
> Project: PDFBox
> Issue Type: Improvement
> Reporter: Christopher Clark
> Attachments: bad-font-p1.pdf
>
>
> I (along with others) have found using the font size of text to be very
> useful when doing things like trying to recover the structure of PDFs. For
> example, in heuristics like 'text with large font sizes are probably titles'.
> However, I noticed a few cases where {{getFontSizeInPt}} or {{getFontSize}}
> return seemingly very inaccurate results. For example, in the attached pdf
> the {{getFontSizeInPt}} for the title text is over 500.
> After digging into this a little, as I understand it neither of these methods
> return the a font size scaled to the display space. {{getFontSize}} returns
> the "raw" encoded font size and {{getFontSizeInPt}} returns the font size
> scaled by the text matrix, but not by the current transformation matrix.
> Basically, in order to get reliable font information, it would be helpful if
> either
> 1) {{getFontSizeInPt}} includes the effect of using the current
> transformation matrix
> 2) A new method like {{getDisplayFontSize}} is added that returns the font
> size scaled to the display space
> As a side note, I have seen several users (including myself), assume that
> {{getFontSize}} returns the font size as would be observed when one opens the
> PDF, and the been confused when these method occasionally do not return the
> results expected. I think {{getFontSize}} would benefit from a clear note
> that the results might not include scaling factors that were used when the
> text was rendered.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]