[
https://issues.apache.org/jira/browse/PDFBOX-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992252#comment-14992252
]
John Hewson edited comment on PDFBOX-3081 at 11/5/15 7:26 PM:
--------------------------------------------------------------
-Those are the "correct" bbox values for that font, i.e. that what's embedded
in the PDF. Given that the bbox is just metadata which doesn't affect
rendering, there's no incentive for them to be accurate.-
Except they're not... PDFBox is returning the bbox from the embedded font
files. But the Font dictionary contains a smaller BBox entry. Looks like we
should be using that preferentially.
was (Author: jahewson):
Those are the "correct" bbox values for that font, i.e. that what's embedded in
the PDF. Given that the bbox is just metadata which doesn't affect rendering,
there's no incentive for them to be accurate - and even so, the bbox is the
upper bounds on the visual area of all glyphs for the font - so maybe they are
perfectly accurate when diacritics are taken into account.
If improving text extraction is the goal, then it might be that we want to work
with the visual bounds of the glyphs, instead of their logical bounds.
> Create example to draw glyph sizes in rendered images
> -----------------------------------------------------
>
> Key: PDFBOX-3081
> URL: https://issues.apache.org/jira/browse/PDFBOX-3081
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 2.0.0
> Reporter: Tilman Hausherr
> Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: PDFBOX-679-toobig-marked-1.png
>
>
> DrawPrintTextLocations is PrintTextLocations on steroids: after rendering an
> image, the bounds of the font sizes are drawn on it and the images are saved.
> This will allow to see whether a value shown by PrintTextLocations makes
> sense or not. The output as in PrintTextLocations will be kept. The classic
> PrintTextLocations will also be kept so that people don't write support
> questions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]