In line with the sketch that Peter Kelley provides below, I am personally very sympathetic to the idea of having an internal model that can tolerate difference in format between input and output while preserving in the output everything from the input format it can, even by leaving markers that will be useful on future input of the produced form. (There is a well-known case of Microsoft Office doing this for HTML it exports, although the added information for recovery of the MSO rendition led to many complaints about document bloat.)
There are some conflicts between the desire to do this and the fact that some alterations have non-local consequences and may have other effects. I still support the idea, but there are some tricky cases, including - Changes that overlap/conflict with tracked changes but tracked changes are not updated/preserved properly - Accessibility impacts - Digital signature applying to content not observable by the signer - Covert content of various kinds - breaking of RDF/RDA connections into the document (along with failure to preserve markers correctly) The digital signature and covert-content avoidance cases work against preserving material that is not evident in a given application. In the case of ODF, the damage to tracked changes is survivable (with some loss), because the ODF approach is resilient. But not knowing about the tracked changes gets into the digital signature problem if the material is preserved while not being visible to the user. There is also a case around confusion between two consumers having to do with how image renditions in ODF are negotiated, with the consumer presenting the best that it recognizes that is not necessarily the preferable best that the producer listed in the choices it offered in the document. This raises Digital signature considerations as well. I don’t think this should stop the kind of exploration Peter Kelly is embarked upon. At some point, these considerations will surface and it will be interesting to see what a creative accommodation might be. It's not clear to me that the openoffice.org descendants can do much about format ecumenicalism very quickly, if at all, so I have probably gotten pretty off-topic at this point. -- Dennis E. Hamilton dennis.hamil...@acm.org +1-206-779-9430 https://keybase.io/orcmid PGP F96E 89FF D456 628A X.509 certs used and requested for signed e-mail From: Peter Kelly [mailto:kelly...@gmail.com] Sent: Saturday, August 2, 2014 09:43 To: dev@openoffice.apache.org Subject: Re: OOXML On 2 Aug 2014, at 9:24 pm, Alexandro Colorado <j...@oooes.org> wrote: The Support that is done is to receieve OOXML not to produce them, the discussion issue would be to support legacy formats like .doc or .xls. I still dont see a point to generate OOXML and most people dont care as long as they can send in office native formats. I never heard someone saying, please send it on docx, your doc is a closed binary format. I (and I suspect I'm not alone) see a lack of the ability to 1) Save OOXML documents and 2) Do so while preserving all elements, including unsupported features and Microsoft-only data as being the #1 limitation to OpenOffice today. The fact is, OOXML is in practice extremely widely used (vastly more so than ODF) and I argue that if OpenOffice is to have any relevance going forward it must support it, and support it well. The migration path in particular, which I mentioned previously, is not just about importing files but enabling a period of a number of years during which an organisation can effectively work with a mixture of OOXML and ODF documents. This allows the transition to be done incrementally - a company with 30,000 employees will only migrate if there's a way they can do so bit-by-bit, with some departments sticking with OOXML for longer than others. Because there will be people in different departments that need to work together, those who insist on remaining with OOXML for the time being must be able to collaborate in both directions with those who have switched for all their other documents. It's the same situation as the transition Microsoft made from the old binary formats to OOXML - Office 2007 (and all later versions) still support the older formats, for both read and write, and I expect they will continue for some time. If Office 2007 had completely dropped support for saving .doc, .xls, and .ppt, it would have been dead-on-arrival, as it took several years before most people were saving in the newer format by default. Now there is still the question of how OpenOffice could go about supporting these formats. There is already an import filter which sort-of works (though I had to direct a customer to LibreOffice the other day as they were having trouble opening a perfectly-valid .docx using OpenOffice). This could be left in place, with fixes where necessary, and a new export filter written for saving. The problem with this however is that import/export is inherently a lossy process; if there is any information within a document that is not supported by OO or the filters, then it will be lost after an open/save. This information could also include proprietary extension data that is supported by Office which there is no way to interpret since its format is not published (macros, I believe, are an example of this). The approach I took with UX Write was to use bidirectional transformation [1], which ensures updates happen in a non-destructive manner. When you open a .docx file in UX Write, it converts it into HTML, and keeps track of information that it allows it to map each HTML element back to the original XML element in the .docx file from which it was generated. When you save the file, instead of overwriting it with a new version, it *updates* the existing version by figuring out what changes have occurred in the HTML document, and applying those changes to the original .docx file. This way, only the parts that the user has actually modified are touched; anything UX Write doesn't know about (e.g. embedded spreadsheets) is left untouched. I'm planning to use the same design for ODF. Crucially, this meant that I was able to implement support for OOXML (well, specifically the WordProcessingML part of it) in an incremental fashion. First there was only support for editing text; then came basic formatting, then lists, tables, styles etc. Even today, my implementation doesn't have support for the complete feature set, but it is nonetheless able to "walk lightly" in editing the document, by not touching anything that isn't supported. Coming back to the migration path I mentioned above, whereby there is a need to be able to interoperate with people using OOXML for some period of time, assuming they're eventually lead towards using only ODF. I'd be keen to hear any thoughts others have on this issue, in the sense of how best to tackle it within OpenOffice. I recommend having a look at the slides linked to below, which give a great introduction to what bidirectional transformation is and how it works. There's been a ton of research been done on this in the past, and I think it's ideal for dealing with different document formats, particularly when a given app has treats a particular format as "native" (HTML in the case of UX Write, ODF in the case of OpenOffice). With this approach, we could bypass an entire class of compatibility problems where people complain of losing formatting or other information from their documents (and blame it on OpenOffice, telling their collaborators to use Microsoft Office instead). [1] http://www.cis.upenn.edu/~bcpierce/papers/icmt-2009-slides.pdf -- Dr. Peter M. Kelly Founder, UX Productivity pe...@uxproductivity.com http://www.uxproductivity.com/ http://www.kellypmk.net/ PGP key: http://www.kellypmk.net/pgp-key (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org