On Sat, 19 Jul 2008, Steve Litt wrote:

Believe me, it's not easy at all. RTF is much too wierd. The style identifier occurs in the middle of a complex string. The text to which to apply it occurs at the end, but there's no reasonable, consistant way to identify where the markup ends and the content begins.

You could do the identification of where and what to replace manually, but instead of changing manually, you create a command in a script that does the change. Maybe you wouldn't use a regexp-relace in this case.

To be specific, I'm suggesting separate commands/lines in the script that only does a simle/concrete replacment. Perhaps even one command for each replacement...

The advantage is that you can re-run the script, or run it partially, if you later get into problems.

And more importantly, start over in case you later in the process discover a problem with the sequence of regexp-replacements you've used.

I used a series of files, each of which contains one tweak type, so that shouldn't happen. At the end of each tweak type I verify that it still loads and looks right in MS Word.

You could also use version control and commit after each tweak type.

Writing a program would have been an excellent idea, but RTF is MUCH too wierd to write that program in anything resembling a reasonable timeframe.

I don't think I'm suggesting a program in that sense (at the most the script would be a hack..:-). However, if you really wanted to do to a program, I'd start with something that's capable of parsing RTF.

Btw... googling for 'converting rtf to html css'
http://www.google.com/search?hl=en&client=opera&rls=en&hs=1CL&q=converting+rtf+to+html+css&btnG=Search
gives some results that look useful, but I guess they might all remove
your styles :-(

Actually, you probably want 'convert rtf to xml'. Googling gave this link:
        http://www.rtf-to-xml.com/
where they claim:
        RTF TO XML is a handy solution to convert your RTF documents
        (created, for example, in Microsoft® Word) into custom XML
        formats, preserving their appearance and internal structure.

/Christian

--
Christian Ridderström, +46-8-768 39 44            http://www.md.kth.se/~chr

Reply via email to