[jira] [Closed] (PDFBOX-167) wrong words highlighted

Tilman Hausherr (JIRA) Sat, 03 May 2014 14:29:25 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tilman Hausherr closed PDFBOX-167.
----------------------------------

    Resolution: Cannot Reproduce

On october 2013, I e-mailed both people mentioned in this issue:
{quote}
Is this still an issue? I looked at the code and it is different than the one 
mentioned. But I can't test the code mentioned because the links are broken.
{quote}
I never got a response. I am thus closing this issue.

> wrong words highlighted
> -----------------------
>
>                 Key: PDFBOX-167
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-167
>             Project: PDFBox
>          Issue Type: Bug
>            Priority: Minor
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1487217
> Originally submitted by nobody on 2006-05-12 01:51.
> PDFBox appears to have a problem properly highlighting
> words from the following PDF. I am using a very simple
> servlet to do this, and it works fine for most PDFs.
> With this one, however, it highlights the wrong words.
> Unfortunately I am not smart enough to figure out what
> is going on myself, so could anybody help me with this?
> The files can be found here:
> http://www.impressie.nl/matthijs/PDFHighlight.java
> http://www.impressie.nl/matthijs/Rectificatie%20van%20Richtlijn%20Handhaving%20van%20Intellectuele-eigendomsrechten.pdf
> Matthijs Bierman
> [email protected]
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO 
> That document is in a password-protected area, so it can't be read by anyone 
> else! I have a similar problem with this doc:
> http://www.usc.edu/schools/business/FBE/seminars/papers/AE_4-28-06_FISMAN-parking.pdf
> ... but I think I've figured this one out. The second page of this document 
> is entirely blank, and checking by hand I can see that the highlights after 
> p1 are all in positions that would be correct if they were one page further 
> on; it appears that the page count isn't being incremented for the blank 
> page. Tracing this back in the code I see this:
>             PDStream contentStream = nextPage.getContents();
>             if( contentStream != null )
>             {
>                 COSStream contents = contentStream.getStream();
>                 processPage( nextPage, contents );
>             }
> (PDFTextStripper.java line 255). That's skipping the blank page and giving me 
> the wrong page no, I think - and I guess that the problem can be resolved by 
> moving currentPageNo++ from inside processPage to just above that test.
> -- [email protected]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Closed] (PDFBOX-167) wrong words highlighted

Reply via email to