On Thu, Mar 09, 2006 at 06:50:09AM -0500, Leonard Rosenthol wrote:
> 
> >So - is iText a good way to extract just the text of a page so that we
> >can use it to calculate the offsets?
> 
>         No.
> 
>         Look at PdfBox or Multivalent.

Thanks for the pointer. Seems like the char offset method isn't too reliable
(something that's 150 chars inside the text fiel from PDFBox is 200 chars in 
according to the highlighter in reader.

But - with word based offset (and a lot of guesswork as to what acrobat reader
thinks is a word boundary) then this looks like it might actually fly :)

-- 
Chris Searle
[EMAIL PROTECTED]


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Reply via email to