[ 
https://issues.apache.org/jira/browse/PDFBOX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226901#comment-16226901
 ] 

Tilman Hausherr edited comment on PDFBOX-3970 at 10/31/17 2:42 PM:
-------------------------------------------------------------------

I mean this part:
{code}
        BoundingBox bbox = font.getBoundingBox();
        if (bbox.getLowerLeftY() < Short.MIN_VALUE)
        {
            // PDFBOX-2158 and PDFBOX-3130
            // files by Salmat eSolutions / ClibPDF Library
            bbox.setLowerLeftY(- (bbox.getLowerLeftY() + 65536));
        }
        // 1/2 the bbox is used as the height todo: why?
        float glyphHeight = bbox.getHeight() / 2;
        
        // sometimes the bbox has very high values, but CapHeight is OK
        PDFontDescriptor fontDescriptor = font.getFontDescriptor();
        if (fontDescriptor != null)
        {
            float capHeight = fontDescriptor.getCapHeight();
            if (capHeight != 0 && (capHeight < glyphHeight || glyphHeight == 0))
            {
                glyphHeight = capHeight;
            }
        }
{code}
That should somehow be replaced with some of the code from 
DrawPrintLocations.calculateGlyphBounds() to get a {{glyphHeight}}.


was (Author: tilman):
I mean this part:
{code}
        BoundingBox bbox = font.getBoundingBox();
        if (bbox.getLowerLeftY() < Short.MIN_VALUE)
        {
            // PDFBOX-2158 and PDFBOX-3130
            // files by Salmat eSolutions / ClibPDF Library
            bbox.setLowerLeftY(- (bbox.getLowerLeftY() + 65536));
        }
        // 1/2 the bbox is used as the height todo: why?
        float glyphHeight = bbox.getHeight() / 2;
        
        // sometimes the bbox has very high values, but CapHeight is OK
        PDFontDescriptor fontDescriptor = font.getFontDescriptor();
        if (fontDescriptor != null)
        {
            float capHeight = fontDescriptor.getCapHeight();
            if (capHeight != 0 && (capHeight < glyphHeight || glyphHeight == 0))
            {
                glyphHeight = capHeight;
            }
        }
{code}
That should somehow be replaced with some of the code from 
DrawPrintLocations.calculateGlyphBounds().

> x,y co-ordinates of the text inside the cell are not getting correctly.
> -----------------------------------------------------------------------
>
>                 Key: PDFBOX-3970
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3970
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.7
>         Environment: Operating system: Windows 7 (64 bit).
>            Reporter: Navnath Kumbhar
>              Labels: how-to
>         Attachments: formula-marked-34.png, 
> paragraphNextToTable-marked-1.png, paragraphNextToTable.pdf
>
>
> Hello Support Team,
> I am working on a project which parses a whole PDF document and stores the 
> extracted text in some .txt file which can be read by other product.
> My issue is regarding extracting the text inside the cell of a table: 
> *x,y co-ordinates of the text inside the cell are not getting correctly.*
> Y value of the last text line in the cell is getting larger than cell's max-Y 
> value.
> I have attached the test file with this bug.
> As you can see in the test document, there is one cell along-with text in it 
> and a text paragraph next to that cell.
> x-y coordinates that I get from pdfbox for all the paths (two vertical and 
> two horizontal lines) of the cell are:
> (in x1,y1,x2,y2 format)
> Horizontal line 1: [100,88,220,88]
> Horizontal line 2: [100,120,220,120]
> Vertical line 1 : [100,88,100,120]
> Vertical line 2: [220,88,220,120]
> (Y values of the above paths are final values by subtracting the actual value 
> given by pdfbox from height of the page as I see that for paths, y-values are 
> processed from bottom to up)
> And bounding box of the last line in that cell is : [102,114,59,7] and hence 
> max-Y of that line becomes 121 (min-Y + height)
>  
> So, if we consider max-Y value of that cell (i.e. 120)  and that of last line 
> in that cell (i.e. 121), clearly, that line goes out of that cell.
> What can be the possible reason for this?
> Thank you in advance!
> Regards,
> Navnath Kumbhar



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to