[
https://issues.apache.org/jira/browse/PDFBOX-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ilija Pavlic updated PDFBOX-1201:
---------------------------------
Unfortunately, I cannot share the sample pdf.
> PDFTextStripperByArea y coordinate shifted "up"
> -----------------------------------------------
>
> Key: PDFBOX-1201
> URL: https://issues.apache.org/jira/browse/PDFBOX-1201
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.6.0
> Reporter: Ilija Pavlic
> Priority: Minor
>
> The text stripper region seems to be shifted up from the given coordinates,
> causing lines below the region to be included and ones above the defined
> region to be included.
> ...
> PDPage page = (PDPage) allPages.get(0);
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
> Rectangle2D.Float region = new Rectangle2D.Float(x, y, width, height);
> stripper.addRegion("test region", region);
> // overlay the region with a cyan rectangle to check if I got the coordinates
> and dimensions right
> PDPageContentStream contentStream = new PDPageContentStream(document, page,
> true, true);
> contentStream.setNonStrokingColor( Color.CYAN );
> contentStream.fillRect(x, y, width, height);
> contentStream.close();
> stripper.extractRegions(page);
> String content = stripper.getTextForRegion("test region");
> ...
> document.save(...);
> ...
> The cyan rectangle overlays the desired region exactly when viewing the saved
> output document. On the other hand, stripper misses a couple of lines at the
> bottom of the rectangle and includes couple of lines above the rectangle.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira