siegfried wrote:

Are there any tools that will accept a PDF and produce XML? Might this be a feature of FOP someday?

Thanks,

Siegfried


That's highly improbable, because PDF is a non-structured format and going from non-structured to structured is a daunting (and often theoretically and practically impossible) task.

There are tools that extract the text from PDF and there are tools that extract the images from PDF. And some create Word (iirc) and/or RTF with layout. Going from RTF to XSL-FO is then rather easy (rtf is text based), but it will get extremely bloated (check out the RTF when you have all options set, the RTF is will get huge already for a couple of pages!). Much of this has to do with the precise positioning inside pdf. Still many objects or properties cannot be extracted at all (borders, backgrounds, alpha channels, overlays, partially embedded fonts).

I don't see a reason why FOP would do such a thing (if PDF can be treated as input, than Word, RTF, TIFF, BMP etc should also be considered, I guess, which makes it next to impossible), it is such a specialized task (compare OCR) that other tools are better suited.

Hope this answers your question,

Cheers,
-- Abel Braaksma

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to