This morning I got an acceptably tagged text file out of MS Word. From that 
moment on, things got much easier.

I made a perl script to remove end tags, and instead put start tags on all 
lines between a start and end. It also made sure there were no interlinking 
tag sets. It also put all the start tags in the same format and easily 
parsable. I hadn't thought to do that when converting out of MS Word -- I had 
bigger fish to fry at the time.

I hadn't marked Normal paragraphs, so my program had to deduce which lines 
weren't marked already, and put a b_pstyle_normal::: start tag on them.

Armed with proper start tags on every line (which is actually a paragraph), it 
was pretty easy to pipe that through something that added the \begin_layout 
Whatever and \end_layout commands. At this point I have NOT removed the start 
or end tags -- I want some redundancy for checking. I also added a little C 
program to get rid of the '\015' characters that DOS put in.

I made a layout with dummy styles for each style I used (sort -u came in very 
handy for this).

Anyway, my program can make the body of a LyX file, and all the 
Part/Chapter/Section etc works perfectly, and it seems like all the other 
paragraph styles are working. It's basically a pipeline of little filters 
creating a LyX file from the text file, and I can do it over and over to my 
heart's content. 

I imagine tomorrow I'll add the code to handle character styles, and start 
making my layout file create effects that look how they're supposed to. That 
will help in looking at the produced PDF (it already produces a PDF, so the 
basic code is correct).

Bottom line, I now have a text file with tags representing all my document's 
original style, and I've created perl, awk, sed and C code to convert it to a 
LyX document with my styles preserved.

Anyway, thanks for all the help.

SteveT

Steve Litt
Recession Relief Package
http://www.recession-relief.US

Reply via email to