PDFTextStripperByArea y coordinate shifted "up"
-----------------------------------------------
Key: PDFBOX-1201
URL: https://issues.apache.org/jira/browse/PDFBOX-1201
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 1.6.0
Reporter: Ilija Pavlic
Priority: Minor
The text stripper region seems to be shifted up from the given coordinates,
causing lines below the region to be included and ones above the defined region
to be included.
...
PDPage page = (PDPage) allPages.get(0);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
Rectangle2D.Float region = new Rectangle2D.Float(x, y, width, height);
stripper.addRegion("test region", region);
// overlay the region with a cyan rectangle to check if I got the coordinates
and dimensions right
PDPageContentStream contentStream = new PDPageContentStream(document, page,
true, true);
contentStream.setNonStrokingColor( Color.CYAN );
contentStream.fillRect(x, y, width, height);
contentStream.close();
stripper.extractRegions(page);
String content = stripper.getTextForRegion("test region");
...
document.save(...);
...
The cyan rectangle overlays the desired region exactly when viewing the saved
output document. On the other hand, stripper misses a couple of lines at the
bottom of the rectangle and includes couple of lines above the rectangle.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira