Christopher Clark created PDFBOX-3405:
-----------------------------------------

             Summary: Display font size
                 Key: PDFBOX-3405
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3405
             Project: PDFBox
          Issue Type: Improvement
            Reporter: Christopher Clark
         Attachments: bad-font-p1.pdf

I (along with others) have found using the font size of text to be very useful 
when doing things like trying to recover the structure of PDFs. For example, in 
heuristics like 'text with large font sizes are probably titles'. However, I 
noticed a few cases where getFontSizePt or getFontSize return seemingly very 
inaccurate results. For example, in the attached pdf the getFontSizePt for the 
title text is over 500.

After digging into this a little, as I understand it neither of these methods 
return the a font size scaled to the display space. getFontSize returns the 
"raw" encoded font size and getFontSizePt returns the font size scaled by the 
text matrix, but not by the current transformation matrix. 

Basically, in order to get reliable font information, it would be helpful if 
either
1) getFontSizePt includes the affect of using current transformation matrix
2) A new method like "getDisplayFontSize" is added that returns the font sizes 
scaled to the display space

As a side note, I have seen several users (including myself), assume that 
"getFontSize" returns the font size as would be observed when one opens the 
PDF, and the been confused when these method occasionally do not return the 
results expected. I think "getFontSize" would benefit from a clear note that 
the results might not include scaling factors that were used when the text was 
rendered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to