Hi Jeff,
any conversion from word to xml (docbook or other schema) depends heavily
on how (well) your word files are structured by means of styles, so you
will probably need to massage and fix the files in Word by hand before the
actual conversion can be done effectively.

Unfortunately the styles functionalities offered by Word are quite a mess
and (IMHO) not robust enough and therefore more often than not you end up
with quite "dirty" files. I have found that the time spent on Word can
often be better used adding tag manually in an xml editor. For this is use
Oxygen in author mode as it has a *very* useful feature, namely a quite
intelligent paste from word where many inlines and sectioning get
translated automatically to docbook. In this way you can just do a quick
clean in Word (the search&replace based on styles is your friend here) to
keep just the inlines, the section titles, lists and tables and then
copy&paste from word to Oxygen in a blank docbook file (Oxigen has
templates for both DB4 and 5). You will end up with all the paras, sections
and main structures already tagged and from this point on you can work
directly in a structured editing environment to finalize the markup. I
don't remember if the footnotes gets converted correctly, but you can do a
quick test on this.

Your mileage may vary depending on the complexity of your source files, but
this "manual" approach often is the quickest and more accurate, as strange
it may seems.

__peppo



On Thu, May 24, 2012 at 7:38 AM, Jeff Powanda <[email protected]> wrote:

>  What’s the easiest way to convert MS Word 2007 documents to DocBook 5
> XML?****
>
> ** **
>
> I’ve tried using the DocBook roundtrip stylesheets. They seemed to work OK
> if I did the following:****
>
> **1.       **Copied the DocBook styles in template.dot to the document.***
> *
>
> **2.       **Applied the DocBook styles to the document.****
>
> **3.       **Saved the document as a Word 2003 XML file.****
>
> **4.       **Converted the Word 2003 XML file to DocBook 5 XML.****
>
> ** **
>
> This worked OK, but it was a lot of work to apply the DocBook styles to
> the document (and there are several documents to convert). Also, the
> resulting DocBook XML file has dbk namespace prefixes on all the elements.
> How do I remove them?****
>
>  ****
>
> I’m not interested in the roundtrip aspect of the roundtrip stylesheets. I
> just want to get Word content into DocBook 5.****
>
> ** **
>
> Regards,****
>
> Jeff Powanda****
>
> Vocera Communications, Inc.****
>
> ** **
>

Reply via email to