[ https://issues.apache.org/jira/browse/PDFBOX-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179965#comment-13179965 ]
Ilija Pavlic commented on PDFBOX-1201: -------------------------------------- It seems like the missed text is part of the larger text box that starts and ends outside the capture region but the text itself is located inside the capture region. > PDFTextStripperByArea y coordinate shifted "up" > ----------------------------------------------- > > Key: PDFBOX-1201 > URL: https://issues.apache.org/jira/browse/PDFBOX-1201 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.6.0 > Reporter: Ilija Pavlic > Priority: Minor > > The text stripper region seems to be shifted up from the given coordinates, > causing lines below the region to be included and ones above the defined > region to be included. > ... > PDPage page = (PDPage) allPages.get(0); > PDFTextStripperByArea stripper = new PDFTextStripperByArea(); > Rectangle2D.Float region = new Rectangle2D.Float(x, y, width, height); > stripper.addRegion("test region", region); > // overlay the region with a cyan rectangle to check if I got the coordinates > and dimensions right > PDPageContentStream contentStream = new PDPageContentStream(document, page, > true, true); > contentStream.setNonStrokingColor( Color.CYAN ); > contentStream.fillRect(x, y, width, height); > contentStream.close(); > stripper.extractRegions(page); > String content = stripper.getTextForRegion("test region"); > ... > document.save(...); > ... > The cyan rectangle overlays the desired region exactly when viewing the saved > output document. On the other hand, stripper misses a couple of lines at the > bottom of the rectangle and includes couple of lines above the rectangle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira