On Feb 5, 2012, at 2:04 AM, Abdelrazak Younes wrote: > Strong suggestion: use LyX proper. I am quite sure you already know that > because I saw some patches from you in this area but I'll explain anyway: > LyX's html own export is so good and fast because it effectively knows the > in-memory representation of the document. You can't be faster nor more > accurate than that. I mean, unless you want to rewrite LyX in python.
Extremely good point, I'm also more comfortable with the HTML export available in LyX. I initially was interested in eLyXer because I thought I might be able to use it to help with an import filter as well. I'm not sure that it can, though. As you note in your email, it doesn't create a document model. > IIUC you want a single module in python for both import and export in python. > But I don't think this is a valid argument. As for the word to lyx format > conversion, if you want to use this epub library there must be a way to use > that in C++ I'm sure⦠I though about using Python because I'd found a tool capable of generating docx for me. After working with it a little more, though, I'm less enamored with it. docx is a pretty straightforward file format, and there's quite a few things that are sloppily implemented. > AFAIK, eLyXer doesn't construct a document model. So you'd better spend this > time reading the C++ code for exporting to html/xhtml ;-) Following Steve's suggestion, I decided to try the "easy" way and directly parse the XHTML created by eLyXer. Turns out that it's not only easy, but probably the best way forward. There are some excellent libraries for reading XML in python. Using lxml, in particular, looks like a good solution. You generate the XHTML, parse it with lxml, and then iterate over the elements, translating as you go. My current script is about 50 lines long, and can be used with either native XHTML or eLyXer. To add new features, you add additional cases describing how to translate the XHTML. Which brings us to an important point: there's already a pretty good LyX -> XHTML -> LibreOffice -> Word pathway for translating documents. Unless I directly implement Word as another backend (which, while a lot of work, isn't difficult), I'm not sure there's much reason for a direct MS Word export. The real need seems to be for an MS Word import, anyway. Cheers, Rob
