On Tuesday, April 18, Michael Gersten wrote:
>
> MD5 *must* duplicate. It may never duplicate in practice; it may never
> duplicate over the life of a single project. But if you are designing
> aircraft software, you must be able to say 'we need to check every byte
> for changes'.
I'm not sure what the "proplem" really is. Yes, what you say is true.
However, if you could give an analysis in terms of MD5 (or whatever
hash is going to be used), and take into account the properties that
we say source files have (the properties that CVS utilizes to optimize
the storage of them), and then figure out the likelyness of MD5 (or
whatever hash) failing, only then do you have an accurate picture of
how good or bad it will be...
> Bottom line: MD5 checking can/should be switch enable-able, default on.
Hmm, I suppose you might have a switch. On the other hand, if the occasion
is rare (as defined above), something on the order of 1 in 10^18, and above,
then I'd say that a switch is not necessary, but a quick "edit" of CVS/Entries
(or wherever the MD5 sums are stored) would be more than enough...
> Timestamp checking can/should be switch enable-able, default on
> Always check every byte can/should be switch enable-able, default off
This is not practical. You can achieve the same thing by checking out
another copy, and doing a "cmp" or "diff" between the files or trees.
> (it is needed if you are in an environment where timestamps can be
> changed or mis-maintained, such as an NT 4.0 sp 6 workstation box
> smb-mounted onto a linux system, as I have -- timestamps on files are not
> reliable, but at least are (seem to be) consistent.)
Timestamps should be checked first. If they mis-match, then MD5. I'd
say this should be enough of an "optimization" to keep *most* MD5
computations within check. Oh, and if the timestamp is the same, but
the file has changed (not firing the MD5 check), a quick "touch" of the
file should fix all...
--Toby.