[
https://issues.apache.org/jira/browse/PDFBOX-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357393#comment-15357393
]
Tilman Hausherr commented on PDFBOX-3405:
-----------------------------------------
In the attached file, I don't get 500 for the title (HYBRID...). Here's what I
get (fspt):
{code}
String[101.6402,77.39978 fs=1.0 fspt=23.0 xscale=23.9104 height=16.163431
space=5.9776 width=17.263306]H
{code}
But I agree with you, the javadoc needs improvement. Here's a small test PDF
content stream and the result:
{code}
stream.saveGraphicsState();
stream.transform(Matrix.getScaleInstance(3, 3));
stream.setFont(PDType1Font.HELVETICA, 5);
stream.beginText();
stream.setTextMatrix(Matrix.getScaleInstance(2, 2));
stream.newLineAtOffset(0, 50);
stream.showText("huhu");
stream.endText();
stream.restoreGraphicsState();
q
3 0 0 3 0 0 cm
/F1 5 Tf
BT
2 0 0 2 0 0 Tm
0 50 Td
(huhu) Tj
ET
Q
{code}
The output is (I modified the code to output more):
{code}
System.out.println("String[" + text.getXDirAdj() + ","
+ text.getYDirAdj() + " fs=" + text.getFontSize() + " fspt=" +
text.getFontSizeInPt() + " Tm=" + text.getTextMatrix() + " xscale="
+ text.getXScale() + " height=" + text.getHeightDir() + " space="
+ text.getWidthOfSpace() + " width="
+ text.getWidthDirAdj() + "]" + text.getUnicode());
String[0.0,541.8898 fs=5.0 fspt=10.0 Tm=[30.0,0.0,0.0,30.0,0.0,300.0]
xscale=30.0 height=17.34 space=8.340001 width=16.68]h
String[16.68,541.8898 fs=5.0 fspt=10.0 Tm=[30.0,0.0,0.0,30.0,16.68,300.0]
xscale=30.0 height=17.34 space=8.340001 width=16.68]u
String[33.36,541.8898 fs=5.0 fspt=10.0 Tm=[30.0,0.0,0.0,30.0,33.36,300.0]
xscale=30.0 height=17.34 space=8.340001 width=16.68]h
String[50.04,541.8898 fs=5.0 fspt=10.0 Tm=[30.0,0.0,0.0,30.0,50.04,300.0]
xscale=30.0 height=17.34 space=8.340001 width=16.68]u
{code}
So "Tm" is misleading here, it is the real stuff and not the Tm. fs and fspt is
kindof useless.
Proposal for javadoc, please tell me if this is an improvement:
- {{getFontSize()}}: This will get the font size that has been set with the
"Tf" operator (Set text font and size). It may appear bigger or smaller
depending on the current transformation matrix and the text matrix.
- {{getFontSizeInPt()}}: This will get the font size in pt. To get this size we
have to multiply the font size from (\{@link #getFontSize()}) with the text
matrix (set by the "Tm" operator) horizontal scaling factor and truncate the
result to integer. The actual rendering may appear bigger or smaller depending
on the current transformation matrix (set by the "cm" operator).
- {{getTextMatrix()}}: The matrix containing the starting text position and
scaling. Despite the name, it is not the matrix set by the "Tm" operator, it is
really the effective text rendering matrix (which is dependent on the current
transformation matrix, the text matrix, the font size and the page cropbox).
Re {{getDisplayFontSize}}: you can get the scale from {{text.getTextMatrix()}}.
However this might still not help - a font could have huge glyphs.
> Display font size
> -----------------
>
> Key: PDFBOX-3405
> URL: https://issues.apache.org/jira/browse/PDFBOX-3405
> Project: PDFBox
> Issue Type: Improvement
> Reporter: Christopher Clark
> Attachments: bad-font-p1.pdf
>
>
> I (along with others) have found using the font size of text to be very
> useful when doing things like trying to recover the structure of PDFs. For
> example, in heuristics like 'text with large font sizes are probably titles'.
> However, I noticed a few cases where {{getFontSizeInPt}} or {{getFontSize}}
> return seemingly very inaccurate results. For example, in the attached pdf
> the {{getFontSizeInPt}} for the title text is over 500.
> After digging into this a little, as I understand it neither of these methods
> return the a font size scaled to the display space. {{getFontSize}} returns
> the "raw" encoded font size and {{getFontSizeInPt}} returns the font size
> scaled by the text matrix, but not by the current transformation matrix.
> Basically, in order to get reliable font information, it would be helpful if
> either
> 1) {{getFontSizeInPt}} includes the effect of using the current
> transformation matrix
> 2) A new method like {{getDisplayFontSize}} is added that returns the font
> size scaled to the display space
> As a side note, I have seen several users (including myself), assume that
> {{getFontSize}} returns the font size as would be observed when one opens the
> PDF, and the been confused when these method occasionally do not return the
> results expected. I think {{getFontSize}} would benefit from a clear note
> that the results might not include scaling factors that were used when the
> text was rendered.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]