Greetings: I read the mailing-list archives about this topic and found the PDFBox solutions at: http://www.pdfbox.org/userguide/highlighting.html
Basically there are 3 options: 1- append query parameters to the PDF URL 2- generate a highlight XML document that Acrobat Reader will download separately 3- alter contents of the PDF I've only tried (1) so far and I really like the user experience: Acrobat Reader displays a list of matches in a small pane on the right. By clicking in this pane the user can bounce around the document. I don't think (2) or (3) can match that. I implemented (1) by rewriting the Lucene query and creating a query string with the terms. I am now facing a problem where the number of terms expand beyond the limits of the query string. I realized it's silly to ask Acrobat to highlight all the query terms when in reality only a subset of them is present in the document. But how can I compute that subset? I'm thinking I might have to tokenize the document text (I have it), then compute the intersection between the set of all terms and the set of terms from the rewritten query. Blech. Sounds expensive. Any other ideas? How does the contrib'ed highlighter work? I'm guessing it's doing something similar. Any way I could not tokenize the document twice (they can be really long)? --Renaud --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
