[
https://issues.apache.org/jira/browse/PDFBOX-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr closed PDFBOX-167.
----------------------------------
Resolution: Cannot Reproduce
On october 2013, I e-mailed both people mentioned in this issue:
{quote}
Is this still an issue? I looked at the code and it is different than the one
mentioned. But I can't test the code mentioned because the links are broken.
{quote}
I never got a response. I am thus closing this issue.
> wrong words highlighted
> -----------------------
>
> Key: PDFBOX-167
> URL: https://issues.apache.org/jira/browse/PDFBOX-167
> Project: PDFBox
> Issue Type: Bug
> Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1487217
> Originally submitted by nobody on 2006-05-12 01:51.
> PDFBox appears to have a problem properly highlighting
> words from the following PDF. I am using a very simple
> servlet to do this, and it works fine for most PDFs.
> With this one, however, it highlights the wrong words.
> Unfortunately I am not smart enough to figure out what
> is going on myself, so could anybody help me with this?
> The files can be found here:
> http://www.impressie.nl/matthijs/PDFHighlight.java
> http://www.impressie.nl/matthijs/Rectificatie%20van%20Richtlijn%20Handhaving%20van%20Intellectuele-eigendomsrechten.pdf
> Matthijs Bierman
> [email protected]
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO
> That document is in a password-protected area, so it can't be read by anyone
> else! I have a similar problem with this doc:
> http://www.usc.edu/schools/business/FBE/seminars/papers/AE_4-28-06_FISMAN-parking.pdf
> ... but I think I've figured this one out. The second page of this document
> is entirely blank, and checking by hand I can see that the highlights after
> p1 are all in positions that would be correct if they were one page further
> on; it appears that the page count isn't being incremented for the blank
> page. Tracing this back in the code I see this:
> PDStream contentStream = nextPage.getContents();
> if( contentStream != null )
> {
> COSStream contents = contentStream.getStream();
> processPage( nextPage, contents );
> }
> (PDFTextStripper.java line 255). That's skipping the blank page and giving me
> the wrong page no, I think - and I guess that the problem can be resolved by
> moving currentPageNo++ from inside processPage to just above that test.
> -- [email protected]
--
This message was sent by Atlassian JIRA
(v6.2#6252)