Antonio Fiol Bonnin wrote:

> Thank you, Con, for your very interesting point of view. We were
> working on (a) but I have told my team that we will be changing
> approach in one hour if they do not see a clear end.
>
> Other than that, I will look into pdftohtml (is it really html?).

http://pdftohtml.sourceforge.net/

It can produce HTML or XML. The XML is closer in form to the content of the
PDF - it has pages containing text with typographic and positional
formatting. The HTML has some of the formatting information removed (I
think) and some kind of guess-work is used to stick lines of text back into
paragraphs.

Reply via email to