Hi Steve,

Have you already considered importing the Word document into OpenOffice Writer and letting one of the OO->LaTeX converters do the hard work? (http://www.hj-gym.dk/~hj/writer2latex/ is one exmple, there might be others)

No, I have never tried or used one of those. However, I heard about success stories -- might be worth a try.

Daniel



On 18.07.2008, at 21:28, Steve Litt wrote:

Hi all,

I have a 300 page book written in MS Word version 97, and I have to convert it
to LyX in order to make the second edition.

I'll accept all condolences now :-)

Believe it or not, the MS Word version was written very much what you guys would call WYSIWYM. I had styles for everything -- almost no appearance was fine tuned. Obviously it's essential that all those styles transfer over into
the LyX version.

I'll accept all condolences now :-)

So heres what my plan, unless someone else has a better idea.

First, I'll export to RTF.

I'll accept all condolences now :-)

Then in Vim I'll do this:

:%s/}/}\r/g

Now the rtf file will have lines that are somewhat recognizeable as markup.

Next I'll look at the \stylesheet part of the RTF, and make a list of all
paragraph and character styles, sort of like this:

\fs20 Normal
\s1 heading 1
\s2 heading 2
\cs10 \additive Default Paragraph Font
\s16 myparagraphstyle
\cs17 mycharstyle
\cs18 mycharstyle2

Then, within Vim I'll run substitions so that the text referred to by the numbers such as \s2 are prepended with my own tags such as phdr2, and better yet that text has a proper ending tag appended. This is not so simple for
three reasons:

1) There's always a bunch of gobblety gook between the \s2 and the text to
which it refers, and that must eventually be deleted.

2) There's often gobblety gook before the \s2, and that gobblety gook must
eventually be deleted.

3) It's MUCH harder to reliably put end tags at the end of the text to which it refers. If I don't put end tags, that means I'll have a much harder time
converting it to LyX.

Next, I'll re-import the rtf into MS Word. What should happen is it re-imports the same as it originally was, only now it has my tags. From there I should be able to export it to plain text, and use my tags to create the LyX file with suitable scripts. Or maybe make scripts to directly manipulate the RTF. Of course, for all my custom character and paragraph styles, I'll need to create those styles within LyX, in a blank document, before appending the
actual content.

Then comes the cleanup. Stuff like tables and images won't convert -- I'll need to manually do that cleanup and then run at least a rough proofread.

The good news is, because the original document used styles for almost every
appearance, fine tuning won't be necessary (hooray for styles!).

I'd estimate this to be about a week's job. That's a lot of time, but in the end I'll have converted a 300 page book, style for style, from MS Word to
LyX.

If anyone has a better idea for converting a 300 page MS Word document to LyX,
style for style and word for word, please let me know.

Thanks

STeveT

Steve Litt
Recession Relief Package
http://www.recession-relief.US

Reply via email to