Hayk Hayryan created PDFBOX-2749:
------------------------------------
Summary: Annotations character bounding boxes size 3 times higher
than expected
Key: PDFBOX-2749
URL: https://issues.apache.org/jira/browse/PDFBOX-2749
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 1.8.4
Reporter: Hayk Hayryan
Priority: Critical
After text extraction the character bounding boxes 3 times higher than
expected. For example, see the first few character bounding boxes below:
[90.1,46,6.64,40.06],[96.7,46,5.09,40.06],[101.79,46,5.8,40.06].
The values are x, y, width, height. The width of the characters are between 5
and 7 pixels, but the height of the characters are 40.6 pixels. The actual
height of each line of text appears to be about 12 pixels. The example pdf
document attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]