Hey Rob, that sounds like quite a nice project you have in mind! My two cents: it's not worth carrying it out if you can't get the math to import somewhat well. That seems to be the biggest problem with most ways of converting doc to lyx. I understand it's very difficult, but I think it's also the most important.
I don't mean to discourage you, just my two cents. I don't think importing track changes is important at all (they should be able to go through the changes in word and get rid of them). And I don't think round-tripping is important. Of course, if you could pull these features off they would be nice additions. I looked at the google code project and it looks like it's still under development. Is that correct? It would be nice to choose a library that is still being actively developed. Good luck with it all and thanks for your effort on this. I think in the end it would indeed help a lot of would-be LyXers or already-LyXers but need to collaborate with a Word-er. Xu On Wed, Feb 1, 2012 at 2:59 PM, Rob Oakes <rob.oa...@oak-tree.us> wrote: > Dear Users and Developers, > > Some time ago, I was experimenting with importing documents into LyX > (specifically about how to crack the import MS Word to LyX nut). In the > process, I got really excited about using OpenOffice to convert the word > document to HTML, running tidy on the HTML and then importing that way. > (The original blog article about this can be found at > http://blog.oak-tree.us/index.php/2010/05/14/msword-lyx-import.) > > Since I'm (re)writing a book chapter about this topic, I thought that I > would look at alternative strategies for importing Word (and other file > formats) into LyX. While doing research, I came across a (potentially) much > better solution. > > Somewhat recently (in 2010), a group of Python libraries were written that > handle document conversions. They are part of the epub-tools library ( > http://code.google.com/p/epub-tools/). (I've been experimenting with ePub > document creation from LyX, which is how I found them.) > > One of the tools in the library is able to parse Microsoft word documents > and convert them to XHTML in preparation for generating an ePub file. I > think that the tool can be adapted for directly converting Word docs to > LyX. Not to LaTeX and then to LyX, but *directly to LyX*. > > I'm putting together a library to experiment with direct conversions (this > is ostensibly being done for the never-ending book project, but will be > released as open code), but before getting too deep into development, I > wanted to poll: > > 1. Is this a tool that would prove useful to yourselves, your > collaborators, and others? > 2. What features would you consider essential? > > (Right now, styles based conversion looks pretty easy -- going from > Heading 1 in Word to Chapter, for example. But I'm not sure how well it > would convert maths. This is something I'll still need to look at, and may > require writing an additional module.) > > 3. What is the best tool to look at for guidance in creating a new > script for word2lyx? tex2lyx? > 4. Does the script need to support special cases, such as importing > Word "track changes"? > 5. Just how important do you consider "round-tripping" a document, > e.g., going from LyX to Word and back to LyX. > 6. Is there anyone who might be interested in collaborating on this? > > Any thoughts would be greatly appreciated. > > Cheers, > > Rob Oakes >