[ 
https://issues.apache.org/jira/browse/PDFBOX-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15721610#comment-15721610
 ] 

Roman commented on PDFBOX-3616:
-------------------------------

[~tilman] Yep, I cutted only 1st page. The yellow highlights are drawn by 
javascript app in browser using coordinates we get via PdfBox, as shown below:

{code}
public class Extractor extends PDFTextStripper {
//<...CUT...>
        protected void writePage() throws IOException {
                for (List<TextPosition> textList : charactersByArticle) { 
//charactersByArticle was inherited from base class
                        Iterator textIter = textList.iterator();
//<...CUT...>
                        while (textIter.hasNext()) {
                                TextPosition position = (TextPosition) 
textIter.next();
//<...CUT...>
                float rh = 
Math.abs(fontDescriptor.getFontBoundingBox().getUpperRightY() / 1000 * yscale);

                float desc = Math.abs(fontDescriptor.getDescent() / 1000 * 
yscale);
                float capHeight = Math.abs(fontDescriptor.getCapHeight() / 1000 
* yscale);
                if (capHeight == 0)
                        capHeight = position.getHeight();
                float h = (rh + Math.max(Math.max(capHeight, 
position.getHeight()), asc)) / 2;

                float y0 = position.getY() - h; // This value is used as Y 
coordinate. The logic works for most documents, but for this doc, the value is 
less then it needs to be, for each glyph it is less by the same constant value, 
about 8 or 9.

{code}

> Characters shifted up
> ---------------------
>
>                 Key: PDFBOX-3616
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3616
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Roman
>         Attachments: 00051-2a7-00052-2a7.pdf_page0.pdf, 
> PDFBOX-3616-marked-1.png, screenshot-1.png
>
>
> We have tried this on both 1.8.12 and 2.0.3 versions and got the same result 
> - character positions are shifted up. 
> We are assuming X and Y positions are relative to CropBox. 
> See [^screenshot-1.png], yellow highlights are upper than the texts.
> PDF doc is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to