> -----Original Message-----
> From: i...@apache.org [mailto:i...@apache.org]
> Sent: maandag 5 september 2016 13:33
> To: comm...@subversion.apache.org
> Subject: svn commit: r1759233 -
> /subversion/trunk/subversion/libsvn_wc/questions.c
> 
> Author: ivan
> Date: Mon Sep  5 11:32:54 2016
> New Revision: 1759233
> 
> URL: http://svn.apache.org/viewvc?rev=1759233&view=rev
> Log:
> Use SHA-1 checksum to find whether files are actually modified in working
> copy if timestamps don't match.
> 
> Before this change we were doing this:
> 1. Compare file timestamps: if they match, assume that files didn't change.
> 2. Open pristine file.
> 3. Read properties from wc.db and find whether translation is required.
> 4. Compare filesize with pristine filesize for files that do not
>    require translation. Assume that file is modified if the sizes differ.
> 5. Compare detranslated contents of working file with pristine.
> 
> Now behavior is the following:
> 1. Compare file timestamps: if they match, assume that files didn't change.
> 3. Read properties from wc.db and find whether translation is required.
> 3. Compare filesize with pristine filesize for files that do not
>    require translation. Assume that file is modified if the sizes differ.
> 4. Calculate SHA-1 checksum of detranslated contents of working file
>    and compare it with pristine's checksum stored in wc.db.

We looked at this before, and this change has pro-s and con-s, depending on 
specific use cases.

With the compare to SHA we only have to read the new file, but we always have 
to read the file 100%.

With the older system we could bail on the first detected change.

If there is a change somewhere both systems read on average 100% of the 
filesize... only if there is no actual change except for the timestamp, the new 
system is less expensive.


If the file happens to be a database file or something similar there is quite 
commonly a change in the first 'block', when there are changes somewhere later 
on. (Checksum, change counter, etc.). File formats like sqlite were explicitly 
designed for this (and other cheap checks), with a change counter at the start.


I don't think we should 'just change behavior' here, if we don't have actual 
usage numbers for our users. Perhaps we should make this feature 
configurable... or depending on filesize. 



We certainly want the new behavior for non-pristine working copies (on the IDEA 
list for years), but I'm not sure if we always want this behavior as only 
option.



This mail is partially, to just discuss this topic on the list, to make sure 
everybody knows what happened here and why.



        Bert

(Note that it is labor day in the USA today... so I don't expect many responses 
until later this week)

Reply via email to