Re: Newbie question

J.Pietschmann Fri, 04 Sep 2009 10:32:25 -0700

On 04.09.2009 15:22, Dola Woolfe wrote:

(Sounds like more than the 1 hour  I was allocating for it.)


PDF as a format isn't meant to be parsed for advanced text processing,
it was designed for presentation. PDF generators could make your job

of parsing text out of the file arbitrarily hard. As an extreme (andrather theoretical) example, a PDF could contain two text streams

"Tiset" and "hsiatx", with embedded positioning commands, which
reads on the screen as "This is a text". In any case, even putting
up reasonable guards against running into out-of-order text blocks
will take a few days, unless you find a ready-to-use library for
this task (no, I don't have pointers).

If you can, try to get your source text in a more processing-friendly
format, like DocBook XML.

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-h...@xmlgraphics.apache.org

Re: Newbie question

Reply via email to