[ 
https://issues.apache.org/jira/browse/PDFBOX-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Villu Ruusmann updated PDFBOX-577:
----------------------------------

    Attachment: textposition-randombg.zip

Due to popular demand, here's the sample application that paints the text 
background in random colors. I suspect that its PDFBox API usage is outdated, 
so take care.
                
> TextPosition should expose its bounding box
> -------------------------------------------
>
>                 Key: PDFBOX-577
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-577
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Villu Ruusmann
>         Attachments: 
> 0001-PDFont.java-Add-methods-to-retreive-the-Ascent-and-D.patch, 
> AFM-getHeight.png, AFM-getUpperRightY.png, textposition-randombg.zip
>
>
> It does not seem to be possible to calculate the bounding box of a 
> TextPosition.
> IIUC, TextPosition#getY is the baseline of the text and 
> TextPosition#getHeight is the absolute height of the text. When I subtract 
> the latter from the former I get a top line, but this is only correct if the 
> text does not contain descender characters.
> Below is a screenshot (AFM-getHeight.png) which shows the bounding boxes of 
> TextPositions calculated as {#getX(), #getY() - #getHeight, #getWidth, 
> #getHeight} painted in random colors. For example, the bounding boxes of 
> parentheses are severely misplaced, which makes the line-by-line text 
> extraction impossible.
> Right now I've solved the problem by tweaking AFM FontMetrics code so that it 
> returns BoundingBox#getUpperRightY instead of BoundingBox#getHeight when 
> queried via PDSimpleFont#getFontHeight(byte[], int, int). Another screenshot 
> (AFM-getUpperRightY.png) shows how this restores the previously broken text 
> extraction ability.
> It seems like a good idea to rework TextPosition so that it would be aware of 
> its bounding box:
> *) Replace methods PDSimpleFont#getFontWidth(byte[], int, int) and 
> PDSimpleFont#getFontHeight(byte[], int, int) with a single method 
> PDSimpleFont#getFontBoundingBox(byte[], int, int)
> *) Replace the constructor TextPosition(Matrix, Matrix) with 
> TextPosition(Matrix, BoundingBox)
> *) Add new methods TextPosition#getBoundingBox, 
> TextPosition#getBoundingBoxDir. This shouldn't affect existing application 
> clients, because TextPosition#getY and TextPosition#getHeight remain in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to