On Fri, 09 Nov 2007 18:30:36 -0700, Andrea Valle <[EMAIL PROTECTED]> wrote:

> After wasting my time with an awful pdf to html converter by
> Acrobat,  I discovered this, you may all know:
> http://pdftohtml.sourceforge.net/

Looks impressive...

> The html  conversion is very very good in resulting rendering and
> also in sources, but after some tweakings I got interested in the xml
> conversion it allows.
> The xml format  substantially encodes the infos related to page,
> typically each line is an element. Plus, there are bold and italics
> marked easily as <b> and <i>
> I'm still struggling to understand something really operative of XML
> processing in ConTeXt, so  I switched back to Python.
> I used an incremental sax parser with some replacement.
> This is today's draft.
> Original:
> http://www.semiotiche.it/andrea/membrana/02%20imp.pdf
>
> Recomposed (no setup at all, only \enableregime[utf]):
> http://www.semiotiche.it/andrea/membrana/02imp.pdf

Looks VERY impressive... Tell me, how did you set up the cropmarks etc.?

> pdf --> pdftoxml --> xml --> python script --> tex --> pdf
>
> I recovered par, bold, em, footnotes,  stripping dashes and
> reassembling the text with footnote references. Not bad as a first step.

Did you also try pdftohtml --> html --> context?

> I guess that you xml gurus could probably do much easier and cleaner.
> So, I mean -just for my very specific needs, I con probably  take
> word sources, convert to pdf and then finally reach ConTeXt as
> discussed.

Again, very nice stuff!

Best wishes
Idris

-- 
Professor Idris Samawi Hamid, Editor-in-Chief
International Journal of Shi`i Studies
Department of Philosophy
Colorado State University
Fort Collins, CO 80523

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

Reply via email to