RE: OOXML

Dennis E. Hamilton Sat, 02 Aug 2014 13:06:06 -0700

In line with the sketch that Peter Kelley provides below, I am personally very 
sympathetic to the idea of having an internal model that can tolerate 
difference in format between input and output while preserving in the output 
everything from the input format it can, even by leaving markers that will be 
useful on future input of the produced form.  (There is a well-known case of 
Microsoft Office doing this for HTML it exports, although the added information 
for recovery of the MSO rendition led to many complaints about document bloat.)


There are some conflicts between the desire to do this and the fact that some 
alterations have non-local consequences and may have other effects.  I still 
support the idea, but there are some tricky cases, including

- Changes that overlap/conflict with tracked changes but tracked changes are 
not updated/preserved properly
- Accessibility impacts
- Digital signature applying to content not observable by the signer
- Covert content of various kinds
- breaking of RDF/RDA connections into the document (along with failure to 
preserve markers correctly)

The digital signature and covert-content avoidance cases work against 
preserving material that is not evident in a given application.  In the case of 
ODF, the damage to tracked changes is survivable (with some loss), because the 
ODF approach is resilient.  But not knowing about the tracked changes gets into 
the digital signature problem if the material is preserved while not being 
visible to the user.

There is also a case around confusion between two consumers having to do with 
how image renditions in ODF are negotiated, with the consumer presenting the 
best that it recognizes that is not necessarily the preferable best that the 
producer listed in the choices it offered in the document.  This raises Digital 
signature considerations as well.

I don’t think this should stop the kind of exploration Peter Kelly is embarked 
upon.  At some point, these considerations will surface and it will be 
interesting to see what a creative accommodation might be.

It's not clear to me that the openoffice.org descendants can do much about 
format ecumenicalism very quickly, if at all, so I have probably gotten pretty 
off-topic at this point.


 -- Dennis E. Hamilton
    dennis.hamil...@acm.org    +1-206-779-9430
    https://keybase.io/orcmid  PGP F96E 89FF D456 628A
    X.509 certs used and requested for signed e-mail



From: Peter Kelly [mailto:kelly...@gmail.com] 
Sent: Saturday, August 2, 2014 09:43
To: dev@openoffice.apache.org
Subject: Re: OOXML

On 2 Aug 2014, at 9:24 pm, Alexandro Colorado <j...@oooes.org> wrote:


The Support that is done is to receieve OOXML not to produce them, the
discussion issue would be to support legacy formats like .doc or .xls.

I still dont see a point to generate OOXML and most people dont care
as long as they can send in office native formats.

I never heard someone saying, please send it on docx, your doc is a
closed binary format.

I (and I suspect I'm not alone) see a lack of the ability to 1) Save OOXML 
documents and 2) Do so while preserving all elements, including unsupported 
features and Microsoft-only data as being the #1 limitation to OpenOffice 
today. The fact is, OOXML is in practice extremely widely used (vastly more so 
than ODF) and I argue that if OpenOffice is to have any relevance going forward 
it must support it, and support it well.

The migration path in particular, which I mentioned previously, is not just 
about importing files but enabling a period of a number of years during which 
an organisation can effectively work with a mixture of OOXML and ODF documents. 
This allows the transition to be done incrementally - a company with 30,000 
employees will only migrate if there's a way they can do so bit-by-bit, with 
some departments sticking with OOXML for longer than others. Because there will 
be people in different departments that need to work together, those who insist 
on remaining with OOXML for the time being must be able to collaborate in both 
directions with those who have switched for all their other documents.

It's the same situation as the transition Microsoft made from the old binary 
formats to OOXML - Office 2007 (and all later versions) still support the older 
formats, for both read and write, and I expect they will continue for some 
time. If Office 2007 had completely dropped support for saving .doc, .xls, and 
.ppt, it would have been dead-on-arrival, as it took several years before most 
people were saving in the newer format by default.

Now there is still the question of how OpenOffice could go about supporting 
these formats. There is already an import filter which sort-of works (though I 
had to direct a customer to LibreOffice the other day as they were having 
trouble opening a perfectly-valid .docx using OpenOffice). This could be left 
in place, with fixes where necessary, and a new export filter written for 
saving. The problem with this however is that import/export is inherently a 
lossy process; if there is any information within a document that is not 
supported by OO or the filters, then it will be lost after an open/save. This 
information could also include proprietary extension data that is supported by 
Office which there is no way to interpret since its format is not published 
(macros, I believe, are an example of this).

The approach I took with UX Write was to use bidirectional transformation [1], 
which ensures updates happen in a non-destructive manner. When you open a .docx 
file in UX Write, it converts it into HTML, and keeps track of information that 
it allows it to map each HTML element back to the original XML element in the 
.docx file from which it was generated. When you save the file, instead of 
overwriting it with a new version, it *updates* the existing version by 
figuring out what changes have occurred in the HTML document, and applying 
those changes to the original .docx file. This way, only the parts that the 
user has actually modified are touched; anything UX Write doesn't know about 
(e.g. embedded spreadsheets) is left untouched. I'm planning to use the same 
design for ODF.

Crucially, this meant that I was able to implement support for OOXML (well, 
specifically the WordProcessingML part of it) in an incremental fashion. First 
there was only support for editing text; then came basic formatting, then 
lists, tables, styles etc. Even today, my implementation doesn't have support 
for the complete feature set, but it is nonetheless able to "walk lightly" in 
editing the document, by not touching anything that isn't supported. Coming 
back to the migration path I mentioned above, whereby there is a need to be 
able to interoperate with people using OOXML for some period of time, assuming 
they're eventually lead towards using only ODF.

I'd be keen to hear any thoughts others have on this issue, in the sense of how 
best to tackle it within OpenOffice.

I recommend having a look at the slides linked to below, which give a great 
introduction to what bidirectional transformation is and how it works. There's 
been a ton of research been done on this in the past, and I think it's ideal 
for dealing with different document formats, particularly when a given app has 
treats a particular format as "native" (HTML in the case of UX Write, ODF in 
the case of OpenOffice). With this approach, we could bypass an entire class of 
compatibility problems where people complain of losing formatting or other 
information from their documents (and blame it on OpenOffice, telling their 
collaborators to use Microsoft Office instead).

[1] http://www.cis.upenn.edu/~bcpierce/papers/icmt-2009-slides.pdf

--
Dr. Peter M. Kelly
Founder, UX Productivity
pe...@uxproductivity.com
http://www.uxproductivity.com/
http://www.kellypmk.net/

PGP key: http://www.kellypmk.net/pgp-key
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

RE: OOXML

Reply via email to