> On 30 Apr 2015, at 3:38 am, Gabriela Gibson <gabriela.gib...@gmail.com> wrote: > > Thanks Andrea :) > > It is confusing indeed. And to my embarrassment, I only skimmed the precis. > > Note that my understanding here was that the sample/documents are mostly > aimed at new/casual users (who may well not know about the finer points) > and would probably find the description docx more recognisable than ooxml. > > But looking at the other thread, samples/documents now seems to be moving > towards being a developer stash of useful edge cases.
The vast majority of edge cases are actually in a bespoke text-based format I came up with for the *.test files. I did this when writing the test cases so that I could have all the data in a format that was easy to work with in a text editor and played nicely with version control. I certainly agree it’s good to have a few sample documents in the repository. However for large amounts of edge cases I would recommend the existing approach. Whether or not it would be better to store all the test cases as actual .docx files instead of the simplified text-based representation is something to think about, though changing that now would require very large amount of work. With dfutil, you can convert between the plain text (aka “text package”) format using the -fp and -pp (“pretty print”) options. For example: dfutil -pp file.docx gives you all the XML content you need for a document displayed on your terminal (or redirected to a file) for easy inspection/editing without the awkward steps of having to unzip a file, run xmllint on specific files to indent the XML code, etc. — Dr Peter M. Kelly pmke...@apache.org PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)