On 6 July 2015 at 13:41, Peter Kelly <pmke...@apache.org> wrote: > I haven’t had a chance to look into this yet (and likely won’t for a few > days), but I strongly suspect it’s related to the way in which we serialise > the OPC relationships for the document in the OOXML filter (OPC = Open > Packaging Conventions). The file to look in for this is > DocFormats/filters/ooxml/src/common/OPC.c. > > This is one of the few instances where we actually completely replace > something in the docx file every time it is modified. The OPC specifies a > set of XML files that indicate relationships between different “parts” > (i.e. files) in a package. They’re used as an alternative to path names (I > don’t know why, it seems unnecessary, but that’s how it’s done in OOXML). > > I think there’s two likely possibilities: > > 1. OpenOffice is too strict in what it accepts from the OPC relationship > files, and handles only a subset of possible valid relationships > (presumably whatever MS Office writes out). > AOO has a very limited functionality for OOXML. LibreOffice have a far more mature implementation.
I have several OOXML files that generates crashes or are not rendered correctly in AOO. There was a subproject in AOO to replace the current implemtation, but the people who worked on it (IBM) got side tracked. So in short if AOO has problems with OOXML it is no surprise, and does not signal the document is wrong. Ian@ you are a committer now, so you can get a MSDN subscription gratis, which allows you to download (for testing purposes) a lot of the microsoft products. rgds jan i > > 2. Corinthia is too liberal in writing out the relationships, in that it > does so in a way that, while accepted by MS Office and some other apps, > isn’t strictly in accordance with the spec. > > I suspect it’s likely the former, but I’m not infallible and it could be > the latter ;) > > If you unzip a .docx file, have a look at the files in _rels and > word/_rels - these are the OPC files that would differ and are likely what > OO for whatever reason is struggling with. > > — > Dr Peter M. Kelly > pmke...@apache.org > > PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key> > (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966) > > > On 5 Jul 2015, at 6:22 pm, Ian C <i...@amham.net> wrote: > > > > as an addendum to this. I discovered I can open the docx document both > > before and after processing with Caligra Words. > > > > I then tested my scenario of editing the text in a paragraph and using > > the put command. > > The new text does not appear? Does something else need to be changed > > to pick up the edit? > > > > > > ---------- Forwarded message ---------- > > From: Ian C <i...@amham.net> > > Date: Sun, Jul 5, 2015 at 5:18 PM > > Subject: Word round trip issue? And round trip in general. > > To: dev <dev@corinthia.incubator.apache.org> > > > > > > Hi > > > > I have a test docx file used to test the Calibre word plugin > > > > I can read the docx file using OpenOffice 4.0. > > > > Then I used dfconvert get to convert it to an html. > > And the corresponding put to get it back. With no edits or anything > > done to the html. > > > > The document is no longer readable by OpenOffice. > > > > I don't have Word on this unix system so can't see if Word could still > read it. > > > > Something gone wrong or are my expectations incorrect? > > > > I was trying this to see how the word converter handled say editing > > the text within and html document. > > > > And I have some of the mechanics of odt doing a round trip. In fact my > > test document can be written to html and read back again. Although no > > real work is being done it is really just a copy of the original. > > Which leads me to wonder what scenarios I should be looking at. I was > > going to start with a simple text edit. > > > > -- > > Cheers, > > > > Ian C > > > > > > -- > > Cheers, > > > > Ian C > >