On Fri, 09 Nov 2007 18:30:36 -0700, Andrea Valle <[EMAIL PROTECTED]> wrote:
> After wasting my time with an awful pdf to html converter by > Acrobat, I discovered this, you may all know: > http://pdftohtml.sourceforge.net/ Looks impressive... > The html conversion is very very good in resulting rendering and > also in sources, but after some tweakings I got interested in the xml > conversion it allows. > The xml format substantially encodes the infos related to page, > typically each line is an element. Plus, there are bold and italics > marked easily as <b> and <i> > I'm still struggling to understand something really operative of XML > processing in ConTeXt, so I switched back to Python. > I used an incremental sax parser with some replacement. > This is today's draft. > Original: > http://www.semiotiche.it/andrea/membrana/02%20imp.pdf > > Recomposed (no setup at all, only \enableregime[utf]): > http://www.semiotiche.it/andrea/membrana/02imp.pdf Looks VERY impressive... Tell me, how did you set up the cropmarks etc.? > pdf --> pdftoxml --> xml --> python script --> tex --> pdf > > I recovered par, bold, em, footnotes, stripping dashes and > reassembling the text with footnote references. Not bad as a first step. Did you also try pdftohtml --> html --> context? > I guess that you xml gurus could probably do much easier and cleaner. > So, I mean -just for my very specific needs, I con probably take > word sources, convert to pdf and then finally reach ConTeXt as > discussed. Again, very nice stuff! Best wishes Idris -- Professor Idris Samawi Hamid, Editor-in-Chief International Journal of Shi`i Studies Department of Philosophy Colorado State University Fort Collins, CO 80523 -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________