> On 30 Apr 2015, at 3:38 am, Gabriela Gibson <gabriela.gib...@gmail.com> wrote:
> 
> Thanks Andrea :)
> 
> It is confusing indeed.  And to my embarrassment, I only skimmed the precis.
> 
> Note that my understanding here was that the sample/documents are mostly
> aimed at new/casual users (who may well not know about the finer points)
> and would probably find the description docx more recognisable than ooxml.
> 
> But looking at the other thread, samples/documents now seems to be moving
> towards being a developer stash of useful edge cases.

The vast majority of edge cases are actually in a bespoke text-based format I 
came up with for the *.test files. I did this when writing the test cases so 
that I could have all the data in a format that was easy to work with in a text 
editor and played nicely with version control.

I certainly agree it’s good to have a few sample documents in the repository. 
However for large amounts of edge cases I would recommend the existing 
approach. Whether or not it would be better to store all the test cases as 
actual .docx files instead of the simplified text-based representation is 
something to think about, though changing that now would require very large 
amount of work.

With dfutil, you can convert between the plain text (aka “text package”) format 
using the -fp and -pp (“pretty print”) options. For example:

dfutil -pp file.docx

gives you all the XML content you need for a document displayed on your 
terminal (or redirected to a file) for easy inspection/editing without the 
awkward steps of having to unzip a file, run xmllint on specific files to 
indent the XML code, etc.

—
Dr Peter M. Kelly
pmke...@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

Reply via email to