Within your !hash1.equals(hash2) code, you could save the byte streams to disk, extract the zip files, and use a diff utility to figure out why they're different.
We use a lot of hash maps, which have nondeterministic order. If those maps are serialized out as <node attr1="value1" attr2="value2"/>, then attrs could be swapped. Some values are saved as <node attr1="key1=value1;key2=value2" />, which would also have those problems. Not only do you need to diff your xml in a canonical form (order of XML nodes and attributes, whitespace, self-closing vs paired closing tags, etc), but any data in attributes (such as the serialized map above, ids that link portions of an XML document or between multiple XML documents) also needs to be considered. For example, do you consider two workbooks that print identically where one work book's style table has an extra unused style in its StyleTable to be different? How about to workbooks where two cell styles are swapped, and all references to those styles are updated. The answer might be no for some purposes and yes for others. On Aug 1, 2016 06:10, "Nick Burch" <[email protected]> wrote: > On Mon, 1 Aug 2016, [email protected] wrote: > >> we've been experiencing an indeterminism problem with POI's xlsx format, >> when generating hash values with the following method in testng test cases: >> > > XLSX uses Zip files, which contain within them file dates. If you're > comparing the outer zip file, you would expect an otherwise identical file > to change every second as the datetimes move on > > Nick > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
