[ https://issues.apache.org/jira/browse/PDFBOX-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Villu Ruusmann updated PDFBOX-577: ---------------------------------- Attachment: textposition-randombg.zip Due to popular demand, here's the sample application that paints the text background in random colors. I suspect that its PDFBox API usage is outdated, so take care. > TextPosition should expose its bounding box > ------------------------------------------- > > Key: PDFBOX-577 > URL: https://issues.apache.org/jira/browse/PDFBOX-577 > Project: PDFBox > Issue Type: Improvement > Reporter: Villu Ruusmann > Attachments: > 0001-PDFont.java-Add-methods-to-retreive-the-Ascent-and-D.patch, > AFM-getHeight.png, AFM-getUpperRightY.png, textposition-randombg.zip > > > It does not seem to be possible to calculate the bounding box of a > TextPosition. > IIUC, TextPosition#getY is the baseline of the text and > TextPosition#getHeight is the absolute height of the text. When I subtract > the latter from the former I get a top line, but this is only correct if the > text does not contain descender characters. > Below is a screenshot (AFM-getHeight.png) which shows the bounding boxes of > TextPositions calculated as {#getX(), #getY() - #getHeight, #getWidth, > #getHeight} painted in random colors. For example, the bounding boxes of > parentheses are severely misplaced, which makes the line-by-line text > extraction impossible. > Right now I've solved the problem by tweaking AFM FontMetrics code so that it > returns BoundingBox#getUpperRightY instead of BoundingBox#getHeight when > queried via PDSimpleFont#getFontHeight(byte[], int, int). Another screenshot > (AFM-getUpperRightY.png) shows how this restores the previously broken text > extraction ability. > It seems like a good idea to rework TextPosition so that it would be aware of > its bounding box: > *) Replace methods PDSimpleFont#getFontWidth(byte[], int, int) and > PDSimpleFont#getFontHeight(byte[], int, int) with a single method > PDSimpleFont#getFontBoundingBox(byte[], int, int) > *) Replace the constructor TextPosition(Matrix, Matrix) with > TextPosition(Matrix, BoundingBox) > *) Add new methods TextPosition#getBoundingBox, > TextPosition#getBoundingBoxDir. This shouldn't affect existing application > clients, because TextPosition#getY and TextPosition#getHeight remain in place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira