[ https://issues.apache.org/jira/browse/PDFBOX-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated PDFBOX-3970: ------------------------------------ Labels: how-to (was: ) > x,y co-ordinates of the text inside the cell are not getting correctly. > ----------------------------------------------------------------------- > > Key: PDFBOX-3970 > URL: https://issues.apache.org/jira/browse/PDFBOX-3970 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.7 > Environment: Operating system: Windows 7 (64 bit). > Reporter: Navnath Kumbhar > Labels: how-to > Attachments: formula-marked-34.png, > paragraphNextToTable-marked-1.png, paragraphNextToTable.pdf > > > Hello Support Team, > I am working on a project which parses a whole PDF document and stores the > extracted text in some .txt file which can be read by other product. > My issue is regarding extracting the text inside the cell of a table: > *x,y co-ordinates of the text inside the cell are not getting correctly.* > Y value of the last text line in the cell is getting larger than cell's max-Y > value. > I have attached the test file with this bug. > As you can see in the test document, there is one cell along-with text in it > and a text paragraph next to that cell. > x-y coordinates that I get from pdfbox for all the paths (two vertical and > two horizontal lines) of the cell are: > (in x1,y1,x2,y2 format) > Horizontal line 1: [100,88,220,88] > Horizontal line 2: [100,120,220,120] > Vertical line 1 : [100,88,100,120] > Vertical line 2: [220,88,220,120] > (Y values of the above paths are final values by subtracting the actual value > given by pdfbox from height of the page as I see that for paths, y-values are > processed from bottom to up) > And bounding box of the last line in that cell is : [102,114,59,7] and hence > max-Y of that line becomes 121 (min-Y + height) > > So, if we consider max-Y value of that cell (i.e. 120) and that of last line > in that cell (i.e. 121), clearly, that line goes out of that cell. > What can be the possible reason for this? > Thank you in advance! > Regards, > Navnath Kumbhar -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org