Steve:

Thanks for the reply. I think I was a little unclear regarding the roundtrip 
aspect. 
I am not so interested in round trip, (but it would be nice to have) primarily 
I want to go one way into word. 
However the stylesheets I found happened to be in the 1.75.2/roundtrip 
directory. 

I do appreciate the info on the docx files. I had known that the Microsoft Word 
.docx format is essentially a zip file with a whole bunch of XML files in 
there, but was not aware that the main document content was in document.xml
I have looked at that file, it is still really ugly straight out of word, but I 
am going to concentrate on refining that document down and see if that yields 
any better results. Investigation ongoing.

Another option I had tried was to use openoffice.org writer as it has save as 
docbook.xml, seemed promising but I am not really impressed with the resulting 
XML. 
I tried a couple of approaches, one was to save the document in Word as .odt, 
then in OpenOffice Writer save as a docbook.xml...
The other approach was to just open the word doc in openoffice, and let oo 
convert it and then save as docbook. 
That approach generated xml document that validated, but there are still some 
things (really ugly!) that I would like to see improvement. 
I wrote a few custom xslt to clean up some of it, and that is also ongoing work 
in progress.
I would love to hear some other suggestions/options if anyone has already gone 
through all this! 

If you don't mind, when you do have a decent round trip scenario worked out, 
sharing it, just e-mail me directly, or post a link here. As I said initially I 
got lots of advice from old discussions here in the archives, who knows perhaps 
this can help someone else down the line. 
It would be very nice to have.  

Thank you very much for your kind reply! 
/Greg  






-----Original Message-----
From: Steve Ball <[email protected]>
To: [email protected]
Cc: [email protected]
Sent: Thu, May 6, 2010 6:49 pm
Subject: Re: [docbook-apps] Word 2007+ to DocBook 


Hi,


The stylesheet structure was rationalised for the 1.75.2 release so that Word, 
Pages and OpenOffice formats could all be supported. There is a stylesheet for 
each of those formats that normalises the document to a common format, and then 
the other stylesheets take the document through to structured DocBook.


Office 2007 basically uses WordML under-the-hood, and a .docx "file" is really 
just a Zip file containing the XML documents. The one with the document content 
is word/document.xml. It wouldn't be too much work to upgrade the roundtrip 
stylesheets to handle this document; basically it is just the XML Namespace 
URIs that have changed.


I'm working on libxslt at the moment (implementing XSLT 2.0), so haven't really 
got time to look at the roundtripping stuff. However, email me directly if you 
have any further questions.


Cheers,
Steve Ball


On 07/05/2010, at 6:29 AM, [email protected] wrote:


Howdy DocBook Community: 

I am new to DocBook, and also new to this forum. I have been going through the 
archives, and found some very interesting discussions. Primarily I am 
interested in moving/converting some documents from Word which they were 
authored in to DocBook. 
I have been looking at several tools to help in this process, and found some 
very good information here in the archives. 

One method which seems very promising is the docbook-xsl/roundtrip  
The discussion for this was from a few years ago. So I am thinking that the 
some of the style sheets may have changed with the docbook-xsl-1.75.2 distro 
that I have. The suggested conversions were:
 
 wordml-normalise.xsl, wordml-sections.xsl, wordml-blocks.xsl, wordml-final.xsl
 
none of which I found in the 1.75.2
Instead I have xsl such as: 
normalise-common.xsl, normalise2sections.xsl, sections2blocks.xsl, and 
blocks2dbk.xsl
 
It seems to me that this is just the logical evolution of the same xsl style 
sheets referenced in the archives from years ago. Does anyone know if this is 
indeed the case. 
 
Further there has been little to no discussion or even apparently any new tools 
regarding converting Microsoft Word to DocBook at least for quite a while. 
Corresponding roughly to the time when Microsoft Word started implementing XML 
or w:xml as I like to call it. It is still very ugly xml, and even though the 
new docx format is apparently valid XML it is still cumbersome to work with, at 
least in my opinion. 
Are there any newer tools designed primarily to work with the latest 
incarnation of w:xml or any techniques that could help the effort to get these 
docs into DocBook?
I greatly appreciate any response!
 
Thanks,
/GregP    



= 

Reply via email to