PDFTextStripperByArea y coordinate shifted "up"
-----------------------------------------------

                 Key: PDFBOX-1201
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1201
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.6.0
            Reporter: Ilija Pavlic
            Priority: Minor


The text stripper region seems to be shifted up from the given coordinates, 
causing lines below the region to be included and ones above the defined region 
to be included.

...
PDPage page = (PDPage) allPages.get(0);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();

Rectangle2D.Float region = new Rectangle2D.Float(x, y, width, height);
stripper.addRegion("test region", region);

// overlay the region with a cyan rectangle to check if I got the coordinates 
and dimensions right
PDPageContentStream contentStream = new PDPageContentStream(document, page, 
true, true);
contentStream.setNonStrokingColor( Color.CYAN );
contentStream.fillRect(x, y, width, height);
contentStream.close();

stripper.extractRegions(page);
String content = stripper.getTextForRegion("test region");
...
document.save(...);
...

The cyan rectangle overlays the desired region exactly when viewing the saved 
output document. On the other hand, stripper misses a couple of lines at the 
bottom of the rectangle and includes couple of lines above the rectangle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to