Dear Users and Developers,

Some time ago, I was experimenting with importing documents into LyX
(specifically about how to crack the import MS Word to LyX nut). In the
process, I got really excited about using OpenOffice to convert the word
document to HTML, running tidy on the HTML and then importing that way.
(The original blog article about this can be found at
http://blog.oak-tree.us/index.php/2010/05/14/msword-lyx-import.)

Since I'm (re)writing a book chapter about this topic, I thought that I
would look at alternative strategies for importing Word (and other file
formats) into LyX. While doing research, I came across a (potentially)
much better solution.

Somewhat recently (in 2010), a group of Python libraries were written
that handle document conversions. They are part of the epub-tools
library (http://code.google.com/p/epub-tools/). (I've been experimenting
with ePub document creation from LyX, which is how I found them.)

One of the tools in the library is able to parse Microsoft word
documents and convert them to XHTML in preparation for generating an
ePub file. I think that the tool can be adapted for directly converting
Word docs to LyX. Not to LaTeX and then to LyX, but /directly to LyX/.

I'm putting together a library to experiment with direct conversions
(this is ostensibly being done for the never-ending book project, but
will be released as open code), but before getting too deep into
development, I wanted to poll:

 1. Is this a tool that would prove useful to yourselves, your
    collaborators, and others?
 2. What features would you consider essential?

    (Right now, styles based conversion looks pretty easy -- going from
    Heading 1 in Word to Chapter, for example. But I'm not sure how well
    it would convert maths. This is something I'll still need to look
    at, and may require writing an additional module.)

 3. What is the best tool to look at for guidance in creating a new
    script for word2lyx? tex2lyx?
 4. Does the script need to support special cases, such as importing
    Word "track changes"?
 5. Just how important do you consider "round-tripping" a document,
    e.g., going from LyX to Word and back to LyX.
 6. Is there anyone who might be interested in collaborating on this?

Any thoughts would be greatly appreciated.

Cheers,

Rob Oakes

Reply via email to