(New thread, taken from an observation in the "No-op changes no longer
dumped..." thread. In that thread, I hadn't spotted that it is only
the props comparison.)

Starting from 1.9.0, the FS API content comparison methods

    svn_fs_contents_changed()
    svn_fs_contents_different()
    svn_fs_props_changed()
    svn_fs_props_different()

are implemented in FSFS by svn_fs_fs__dag_things_different() which calls

    svn_fs_fs__prop_rep_equal(strict=TRUE/FALSE)
    and/or
    svn_fs_fs__file_text_rep_equal(strict=TRUE/FALSE)

* svn_fs_fs__file_text_rep_equal() uses MD5 checksum as a quick check,
returns 'not equal' if MD5s do not match, else gets a definitive
answer by comparing SHA1s (if present) or full text [1]. That's fine.

* svn_fs_fs__prop_rep_equal(), by contrast, reports that two
properties-reps are equal if their MD5 checksums are equal. [2]

These functions were introduced in
http://svn.apache.org/viewvc?view=revision&revision=1572336. The log
message indicates relying on MD5 equality for properties was
intentional, but is it consistent with the general guarantees we want
from Subversion? I think we generally want to rely on at least SHA1
equality, and I think we should apply that rule for all data.

If we change this to return early only if MD5s differ, else fall
through to a full check if MD5s match, the single run-time cost of
falling through is not severe (and is a hit that we take anyway
whenever one set of props is in a txn rather than a revision), and the
frequency of hitting this code path would be near zero -- only cases
where there is an MD5 collision. So I think not relying on MD5
equality here is a definite improvement.

In terms of severity, I think we should backport "don't rely on MD5
equality" to 1.9.x, but with no special urgency.

FSFS and FSX are affected equally; BDB is unaffected, as its code in
txn_body_props_changed() doesn't use the MD5 shortcut.

On looking further, I found more concerns with
svn_fs_fs__prop_rep_equal(). In non-strict mode it behaves quite
differently for (rev:rev) comparison than for a (rev:txn) or (txn:txn)
comparison. I don't like this; I think that is likely to lead to
problems.

The four comparison methods in the FS API are tested by
tests/libsvn_fs/fs-test[.c] 48.

  * This test only tests (rev:rev) comparisons.
  * This test looks includes a case where text and a property are set
to two versions of 'evil' text thay yield the same MD5 sum. I
initially thought this case would ensure test coverage for the unusual
(MD5 collision) case for all functions under test, but it does not.
The 'evil' text forms only a substring of the property-list rep, and
the MD5 sums of the full reps no longer collide.

Also it's confusing because the test sub-case 'evil text' (i=3)
doesn't actually end up executing the code path for finding identical
MD5s of the reps, because the special text is just the value of one
prop and not the complete rep of the properties list.

I propose the attached patch, for a start. It doesn't do anything to
resolve the change (regression) of the exact behaviour of non-strict
mode comparisons, which is the subject of another thread. It only
addresses the issues listed in this email.

Thoughts?

- Julian



[1] An extract from svn_fs_fs__file_text_rep_equal():
http://svn.apache.org/viewvc/subversion/tags/1.9.2/subversion/libsvn_fs_fs/fs_fs.c?view=markup#l1417

  /* File text representations always know their checksums - even in a txn. */
  if (memcmp(rep_a->md5_digest, rep_b->md5_digest, sizeof(rep_a->md5_digest)))
    {
      *equal = FALSE;
      return SVN_NO_ERROR;
    }

  /* Paranoia. Compare SHA1 checksums because that's the level of
     confidence we require for e.g. the working copy. */
  if (rep_a->has_sha1 && rep_b->has_sha1)
    {
      *equal = memcmp(rep_a->sha1_digest, rep_b->sha1_digest,
                      sizeof(rep_a->sha1_digest)) == 0;
      return SVN_NO_ERROR;
    }

[2] An extract from svn_fs_fs__prop_rep_equal():
http://svn.apache.org/viewvc/subversion/tags/1.9.2/subversion/libsvn_fs_fs/fs_fs.c?view=markup#l1490

  /* Committed property lists can be compared quickly */
  if (   rep_a && rep_b
      && !svn_fs_fs__id_txn_used(&rep_a->txn_id)
      && !svn_fs_fs__id_txn_used(&rep_b->txn_id))
    {
      /* MD5 must be given. Having the same checksum is good enough for
         accepting the prop lists as equal. */
      *equal = memcmp(rep_a->md5_digest, rep_b->md5_digest,
                      sizeof(rep_a->md5_digest)) == 0;
      return SVN_NO_ERROR;
    }
Improve consistency of detection of content 'changes' and 'differences' in
FSFS (and FSX).

Ensure the result is consistent between transactions and revisions.
Previously a non-strict properties comparison gave different results for
(rev:rev) than for (rev:txn) or (txn:txn).

Ensure the result is consistent between text content and propertly lists.
Use the same method to compare text reps as to compare property reps, and
share the code.

Don't rely on MD5 checksums being equal to detect equality of property
lists. In case of an MD5 collision, do a more rigorous check, at least a
SHA1 comparison, as we were already doing for file text.

* subversion/libsvn_fs_fs/fs_fs.c
  (rep_equal): New, extracted from svn_fs_fs__file_text_rep_equal().
  (svn_fs_fs__file_text_rep_equal): Use it.
  (svn_fs_fs__prop_rep_equal): Use it.

* subversion/libsvn_fs_x/fs_x.c
  (svn_fs_x__prop_rep_equal): 
    ### TODO: The same...

* subversion/tests/libsvn_fs/fs-test.c
  (compare_contents): ### ...
--This line, and those below, will be ignored--

Index: subversion/libsvn_fs_fs/fs_fs.c
===================================================================
--- subversion/libsvn_fs_fs/fs_fs.c	(revision 1705023)
+++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
@@ -1384,23 +1384,25 @@ svn_fs_fs__file_length(svn_filesize_t *l
       *length = data_rep->expanded_size;
     }
 
   return SVN_NO_ERROR;
 }
 
-svn_error_t *
-svn_fs_fs__file_text_rep_equal(svn_boolean_t *equal,
-                               svn_fs_t *fs,
-                               node_revision_t *a,
-                               node_revision_t *b,
-                               svn_boolean_t strict,
-                               apr_pool_t *scratch_pool)
+/* ...
+ */
+static svn_error_t *
+rep_equal(svn_boolean_t *equal,
+          svn_fs_t *fs,
+          representation_t *rep_a,
+          representation_t *rep_b,
+          const svn_fs_id_t *id_a,
+          const svn_fs_id_t *id_b,
+          svn_boolean_t strict,
+          apr_pool_t *scratch_pool)
 {
   svn_stream_t *contents_a, *contents_b;
-  representation_t *rep_a = a->data_rep;
-  representation_t *rep_b = b->data_rep;
   svn_boolean_t a_empty = !rep_a || rep_a->expanded_size == 0;
   svn_boolean_t b_empty = !rep_b || rep_b->expanded_size == 0;
 
   /* This makes sure that neither rep will be NULL later on */
   if (a_empty && b_empty)
     {
@@ -1428,13 +1430,13 @@ svn_fs_fs__file_text_rep_equal(svn_boole
       *equal = memcmp(rep_a->sha1_digest, rep_b->sha1_digest,
                       sizeof(rep_a->sha1_digest)) == 0;
       return SVN_NO_ERROR;
     }
 
   /* Same path in same rev or txn? */
-  if (svn_fs_fs__id_eq(a->id, b->id))
+  if (svn_fs_fs__id_eq(id_a, id_b))
     {
       *equal = TRUE;
       return SVN_NO_ERROR;
     }
 
   /* Old repositories may not have the SHA1 checksum handy.
@@ -1458,64 +1460,34 @@ svn_fs_fs__file_text_rep_equal(svn_boole
                                    scratch_pool));
 
   return SVN_NO_ERROR;
 }
 
 svn_error_t *
+svn_fs_fs__file_text_rep_equal(svn_boolean_t *equal,
+                               svn_fs_t *fs,
+                               node_revision_t *a,
+                               node_revision_t *b,
+                               svn_boolean_t strict,
+                               apr_pool_t *scratch_pool)
+{
+  SVN_ERR(rep_equal(equal, fs, a->data_rep, b->data_rep,
+                    a->id, b->id, strict, scratch_pool));
+  return SVN_NO_ERROR;
+}
+
+svn_error_t *
 svn_fs_fs__prop_rep_equal(svn_boolean_t *equal,
                           svn_fs_t *fs,
                           node_revision_t *a,
                           node_revision_t *b,
                           svn_boolean_t strict,
                           apr_pool_t *scratch_pool)
 {
-  representation_t *rep_a = a->prop_rep;
-  representation_t *rep_b = b->prop_rep;
-  apr_hash_t *proplist_a;
-  apr_hash_t *proplist_b;
-
-  /* Mainly for a==b==NULL */
-  if (rep_a == rep_b)
-    {
-      *equal = TRUE;
-      return SVN_NO_ERROR;
-    }
-
-  /* Committed property lists can be compared quickly */
-  if (   rep_a && rep_b
-      && !svn_fs_fs__id_txn_used(&rep_a->txn_id)
-      && !svn_fs_fs__id_txn_used(&rep_b->txn_id))
-    {
-      /* MD5 must be given. Having the same checksum is good enough for
-         accepting the prop lists as equal. */
-      *equal = memcmp(rep_a->md5_digest, rep_b->md5_digest,
-                      sizeof(rep_a->md5_digest)) == 0;
-      return SVN_NO_ERROR;
-    }
-
-  /* Same path in same txn? */
-  if (svn_fs_fs__id_eq(a->id, b->id))
-    {
-      *equal = TRUE;
-      return SVN_NO_ERROR;
-    }
-
-  /* Skip the expensive bits unless we are in strict mode.
-     Simply assume that there is a difference. */
-  if (!strict)
-    {
-      *equal = FALSE;
-      return SVN_NO_ERROR;
-    }
-
-  /* At least one of the reps has been modified in a txn.
-     Fetch and compare them. */
-  SVN_ERR(svn_fs_fs__get_proplist(&proplist_a, fs, a, scratch_pool));
-  SVN_ERR(svn_fs_fs__get_proplist(&proplist_b, fs, b, scratch_pool));
-
-  *equal = svn_fs__prop_lists_equal(proplist_a, proplist_b, scratch_pool);
+  SVN_ERR(rep_equal(equal, fs, a->data_rep, b->data_rep,
+                    a->id, b->id, strict, scratch_pool));
   return SVN_NO_ERROR;
 }
 
 
 svn_error_t *
 svn_fs_fs__file_checksum(svn_checksum_t **checksum,
Index: subversion/tests/libsvn_fs/fs-test.c
===================================================================
--- subversion/tests/libsvn_fs/fs-test.c	(revision 1705023)
+++ subversion/tests/libsvn_fs/fs-test.c	(working copy)
@@ -5947,12 +5947,14 @@ purge_txn_test(const svn_test_opts_t *op
   /* Check the list. It should have no entries. */
   SVN_TEST_ASSERT(txn_list->nelts == 0);
 
   return SVN_NO_ERROR;
 }
 
+/* ### This currently only tests svn_fs_{contents,props}_{different,changed}
+       on revision roots, not on txn roots. */
 static svn_error_t *
 compare_contents(const svn_test_opts_t *opts,
                  apr_pool_t *pool)
 {
   svn_fs_t *fs;
   svn_fs_txn_t *txn;
@@ -5993,15 +5995,15 @@ compare_contents(const svn_test_opts_t *
 
       svn_boolean_t different;  /* result of svn_fs_*_different */
       svn_tristate_t changed;   /* result of svn_fs_*_changed */
     } to_compare[] =
     {
       /* same representation */
-      { 1, "foo", 2, "foo", FALSE, svn_tristate_false },
-      { 1, "foo", 2, "bar", FALSE, svn_tristate_false },
-      { 2, "foo", 2, "bar", FALSE, svn_tristate_false },
+      { 1, "foo", 2, "foo", FALSE, svn_tristate_unknown },
+      { 1, "foo", 2, "bar", FALSE, svn_tristate_unknown },
+      { 2, "foo", 2, "bar", FALSE, svn_tristate_unknown },
 
       /* different content but MD5 check is not reliable */
       { 3, "foo", 3, "bar", TRUE, svn_tristate_true },
 
       /* different contents */
       { 1, "foo", 3, "bar", TRUE, svn_tristate_true },

Reply via email to