Hey Rob, that sounds like quite a nice project you have in mind!

My two cents: it's not worth carrying it out if you can't get the math to
import somewhat well. That seems to be the biggest problem with most ways
of converting doc to lyx. I understand it's very difficult, but I think
it's also the most important.

I don't mean to discourage you, just my two cents. I don't think importing
track changes is important at all (they should be able to go through the
changes in word and get rid of them). And I don't think round-tripping is
important. Of course, if you could pull these features off they would be
nice additions.

I looked at the google code project and it looks like it's still under
development. Is that correct? It would be nice to choose a library that is
still being actively developed.

Good luck with it all and thanks for your effort on this. I think in the
end it would indeed help a lot of would-be LyXers or already-LyXers but
need to collaborate with a Word-er.

Xu

On Wed, Feb 1, 2012 at 2:59 PM, Rob Oakes <rob.oa...@oak-tree.us> wrote:

>  Dear Users and Developers,
>
> Some time ago, I was experimenting with importing documents into LyX
> (specifically about how to crack the import MS Word to LyX nut). In the
> process, I got really excited about using OpenOffice to convert the word
> document to HTML, running tidy on the HTML and then importing that way.
> (The original blog article about this can be found at
> http://blog.oak-tree.us/index.php/2010/05/14/msword-lyx-import.)
>
> Since I'm (re)writing a book chapter about this topic, I thought that I
> would look at alternative strategies for importing Word (and other file
> formats) into LyX. While doing research, I came across a (potentially) much
> better solution.
>
> Somewhat recently (in 2010), a group of Python libraries were written that
> handle document conversions. They are part of the epub-tools library (
> http://code.google.com/p/epub-tools/). (I've been experimenting with ePub
> document creation from LyX, which is how I found them.)
>
> One of the tools in the library is able to parse Microsoft word documents
> and convert them to XHTML in preparation for generating an ePub file. I
> think that the tool can be adapted for directly converting Word docs to
> LyX. Not to LaTeX and then to LyX, but *directly to LyX*.
>
> I'm putting together a library to experiment with direct conversions (this
> is ostensibly being done for the never-ending book project, but will be
> released as open code), but before getting too deep into development, I
> wanted to poll:
>
>    1. Is this a tool that would prove useful to yourselves, your
>    collaborators, and others?
>    2. What features would you consider essential?
>
>    (Right now, styles based conversion looks pretty easy -- going from
>    Heading 1 in Word to Chapter, for example. But I'm not sure how well it
>    would convert maths. This is something I'll still need to look at, and may
>    require writing an additional module.)
>
>     3. What is the best tool to look at for guidance in creating a new
>    script for word2lyx? tex2lyx?
>    4. Does the script need to support special cases, such as importing
>    Word "track changes"?
>    5. Just how important do you consider "round-tripping" a document,
>    e.g., going from LyX to Word and back to LyX.
>    6. Is there anyone who might be interested in collaborating on this?
>
> Any thoughts would be greatly appreciated.
>
> Cheers,
>
> Rob Oakes
>

Reply via email to