PDF Highlighting Again

Renaud Waldura Thu, 09 Nov 2006 10:56:14 -0800

Greetings:
 
I read the mailing-list archives about this topic and found the PDFBox
solutions at: http://www.pdfbox.org/userguide/highlighting.html


Basically there are 3 options:
1- append query parameters to the PDF URL
2- generate a highlight XML document that Acrobat Reader will download
separately
3- alter contents of the PDF

I've only tried (1) so far and I really like the user experience: Acrobat
Reader displays a list of matches in a small pane on the right. By clicking
in this pane the user can bounce around the document. I don't think (2) or
(3) can match that.

I implemented (1) by rewriting the Lucene query and creating a query string
with the terms. I am now facing a problem where the number of terms expand
beyond the limits of the query string.

I realized it's silly to ask Acrobat to highlight all the query terms when
in reality only a subset of them is present in the document. But how can I
compute that subset?

I'm thinking I might have to tokenize the document text (I have it), then
compute the intersection between the set of all terms and the set of terms
from the rewritten query. Blech. Sounds expensive. Any other ideas?

How does the contrib'ed highlighter work? I'm guessing it's doing something
similar. Any way I could not tokenize the document twice (they can be really
long)? 

--Renaud




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

PDF Highlighting Again

Reply via email to