Hi there!

Denver Gingerich wrote:
On 6/5/07, Mathias Bauer <[EMAIL PROTECTED]> wrote:

Denver Gingerich wrote:

> For example, the following should work:
>
> $ svn commit test.odt
> $ mv test.odt test1.odt
> $ svn up
> [pulls deleted test.odt from repository]
> $ diff test.odt test1.odt && echo match
> match
>
> An SCM user might, for example, keep checksums of all the files they
> commit to verify that no one tampered with their files while the files
> were hosted on the SCM system.  If the checksum doesn't match and the
> SCM did not always return an identical binary for a given file and
> revision, then the user wouldn't know whether their file had been
> tampered with or if the SCM had just decided to modify it but it still
> "meant" the same thing.

So perhaps the checksum of the binary files is the wrong tool to detect
file identity? Perhaps the checksum should be created from the
uncompressed streams inside the file? Such functionality could be
implemented as an OOo extension so that users can check file integrity
by comparing the CRC with a stored one. Kind of a "poor man's signing". :-)


Implementing a checksum feature in OOo would be re-implementing a
function that is already provided by many widely-available tools such
as md5sum.  In the spirit of Unix simplicity, it would make more sense
to keep this function out of OOo and let the existing tools do what
they do best.

It makes more sense to be able to re-package extracted OOo file
contents to be identical to the original OOo file than to start
implementing more features that shouldn't be part of OOo to begin
with.


*hmm* I do have a sligthly different view:

I do like the idea of Unix (and also OpenSource in general) being a toolbox where existing tools can be reused very often as much as you, but ...

Being identical is just not the same everywhere it depends on the context.

In the context of ODF files can be identical that are not 100% binary identical. Thus using an existing tool from the toolbox like the md5sum for testing for identity is just not good enough. For ODF IMHO an identical file has the same entries in the manifest.xml file, contains the same (on XML level basis) XML streams and the same binary content referenced there and does contain the same thumbnail image and mimetype information. Stuff like in which order files are in the package wether there is an additional entry for each parent directory in the ZIP file or which compression level is being used or how much ignorable whitespace there is in the XML Streams is just of no importance for an identity check.

For things like integrating with a content management or version control system I could imaging that better than having an OOo extension for checking identity (like suggested by Mathias) would be an external tool to create an odfpackage checksum or to compare two files for being identical or not. I could imaging that we might want to create such tool(s) in the context of the odftoolkit project.

See: http://odftoolkit.openoffice.org/

The Implementation than of such a tool might as well also reuse the existing md5sum tool or an existing md5-checksum library and just pipe stuff into that which is being extracted from the ODF file and sorted into a certain order before or something like that, to keep up to that spirit of Unix+OpenSource simplicity you have been mentioning.

PS: The odftoolkit project just started and is open for new members to join if you would like to help adding something like this to the existing framework eg. for the odf4j ( ODF for java ) library we are just creating there.

Denver


Kind regards,
Bernd Eilers

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to