Re: [dev] Constructing identical zip file from .ods/.odt contents

Bernd Eilers Wed, 06 Jun 2007 09:32:07 -0700


Hi there!


Denver Gingerich wrote:

On 6/5/07, Mathias Bauer <[EMAIL PROTECTED]> wrote:

Denver Gingerich wrote:

> For example, the following should work:
>
> $ svn commit test.odt
> $ mv test.odt test1.odt
> $ svn up
> [pulls deleted test.odt from repository]
> $ diff test.odt test1.odt && echo match
> match
>
> An SCM user might, for example, keep checksums of all the files they
> commit to verify that no one tampered with their files while the files
> were hosted on the SCM system.  If the checksum doesn't match and the
> SCM did not always return an identical binary for a given file and
> revision, then the user wouldn't know whether their file had been
> tampered with or if the SCM had just decided to modify it but it still
> "meant" the same thing.

So perhaps the checksum of the binary files is the wrong tool to detect
file identity? Perhaps the checksum should be created from the
uncompressed streams inside the file? Such functionality could be
implemented as an OOo extension so that users can check file integrity

by comparing the CRC with a stored one. Kind of a "poor man'ssigning". :-)



Implementing a checksum feature in OOo would be re-implementing a
function that is already provided by many widely-available tools such
as md5sum.  In the spirit of Unix simplicity, it would make more sense
to keep this function out of OOo and let the existing tools do what
they do best.

It makes more sense to be able to re-package extracted OOo file
contents to be identical to the original OOo file than to start
implementing more features that shouldn't be part of OOo to begin
with.


*hmm* I do have a sligthly different view:

I do like the idea of Unix (and also OpenSource in general) being atoolbox where existing tools can be reused very often as much as you,but ...


Being identical is just not the same everywhere it depends on the context.

In the context of ODF files can be identical that are not 100% binaryidentical. Thus using an existing tool from the toolbox like the md5sumfor testing for identity is just not good enough. For ODF IMHO anidentical file has the same entries in the manifest.xml file, containsthe same (on XML level basis) XML streams and the same binary contentreferenced there and does contain the same thumbnail image and mimetypeinformation. Stuff like in which order files are in the package wetherthere is an additional entry for each parent directory in the ZIP fileor which compression level is being used or how much ignorablewhitespace there is in the XML Streams is just of no importance for anidentity check.

For things like integrating with a content management or version controlsystem I could imaging that better than having an OOo extension forchecking identity (like suggested by Mathias) would be an external toolto create an odfpackage checksum or to compare two files for beingidentical or not. I could imaging that we might want to create suchtool(s) in the context of the odftoolkit project.


See: http://odftoolkit.openoffice.org/

The Implementation than of such a tool might as well also reuse theexisting md5sum tool or an existing md5-checksum library and just pipestuff into that which is being extracted from the ODF file and sortedinto a certain order before or something like that, to keep up to thatspirit of Unix+OpenSource simplicity you have been mentioning.

PS: The odftoolkit project just started and is open for new members tojoin if you would like to help adding something like this to theexisting framework eg. for the odf4j ( ODF for java ) library we arejust creating there.

Denver


Kind regards,
Bernd Eilers

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [dev] Constructing identical zip file from .ods/.odt contents

Reply via email to