>
> We do have an extra call when a duplicate rep is attempted to be
> inserted into the rep-cache.  When does that happen?  I know it will
> happen when called from build-repcache; what other use-cases trigger the
> codepath you changed?  Committing two files with equal contents
> sequentially doesn't call svn_fs_fs__set_rep_reference() at all for the
> second file.
>

I have spent some time examining when it happens and I see the following
cases:
1) Commit multiple files with the same content in one revision.
2) During commit, svn_fs_fs__set_rep_reference() is called for a property
    representation of each node, regardless of whether it is a duplicate or
not.
3) Commit a file with the same content as a property representation.


> What tests is this behaviour covered by?
>

I think that:
- Case 1) is for example covered by the rep_sharing_effectiveness() test.
- Case 2) is implicitly covered by various tests for versioned property
changes.
- Case 3) is not currently covered by a specific test, but the situation
seems to be
an edge case.


> What scenario does this bring a performance improvement in?  What is the
> improvement, quantitatively?
>

I have compared the approaches with ‘FAIL’ and ‘IGNORE’ conflict resolution
algorithms using a synthetic benchmark. Inserting 10000 identical entries
into
a rep-cache.db which contains 250000 entries:

1) In the case of using the ‘FAIL’ algorithm: ~40000 microseconds.

2) In the case of using the ‘IGNORE’ algorithm: ~15000 microseconds.


Outside of the benchmark, there is ~15% improvement for the
‘build-repcache’ command.

Based on this, I assume that there are other commands that may benefit from
this change.


I have attached an updated patch.


Regards,

Denis Kovalchuk
rep-cache.db insert optimization.

Use the 'IGNORE' conflict resolution algorithm [1] for STMT_SET_REP
and remove SVN_ERR_SQLITE_CONSTRAINT handling, because the error
should not occur now. It brings a slight performance improvement,
because we remove an extra svn_fs_fs__get_rep_reference() call.

The result of a synthetic benchmark, where 10000 identical entries
are inserted into a rep-cache.db which contains 250000 entries:
1) In the case of using the 'FAIL' algorithm: ~40000 microseconds.
2) In the case of using the 'IGNORE' algorithm: ~15000 microseconds.

[1] https://www.sqlite.org/lang_conflict.html

* subversion/libsvn_fs_fs/rep-cache-db.sql
  (STMT_SET_REP): Use the 'IGNORE' conflict resolution algorithm.
  
* subversion/libsvn_fs_fs/rep-cache.c
  (svn_fs_fs__set_rep_reference): Remove SVN_ERR_SQLITE_CONSTRAINT handling.

Patch by: Denis Kovalchuk <denis.kovalc...@visualsvn.com>

Index: subversion/libsvn_fs_fs/rep-cache-db.sql
===================================================================
--- subversion/libsvn_fs_fs/rep-cache-db.sql    (revision 1875655)
+++ subversion/libsvn_fs_fs/rep-cache-db.sql    (working copy)
@@ -61,7 +61,7 @@ WHERE hash = ?1
 
 -- STMT_SET_REP
 /* Works for both V1 and V2 schemas. */
-INSERT OR FAIL INTO rep_cache (hash, revision, offset, size, expanded_size)
+INSERT OR IGNORE INTO rep_cache (hash, revision, offset, size, expanded_size)
 VALUES (?1, ?2, ?3, ?4, ?5)
 
 -- STMT_GET_REPS_FOR_RANGE
Index: subversion/libsvn_fs_fs/rep-cache.c
===================================================================
--- subversion/libsvn_fs_fs/rep-cache.c (revision 1875655)
+++ subversion/libsvn_fs_fs/rep-cache.c (working copy)
@@ -328,7 +328,6 @@ svn_fs_fs__set_rep_reference(svn_fs_t *fs,
 {
   fs_fs_data_t *ffd = fs->fsap_data;
   svn_sqlite__stmt_t *stmt;
-  svn_error_t *err;
   svn_checksum_t checksum;
   checksum.kind = svn_checksum_sha1;
   checksum.digest = rep->sha1_digest;
@@ -351,29 +350,8 @@ svn_fs_fs__set_rep_reference(svn_fs_t *fs,
                             (apr_int64_t) rep->size,
                             (apr_int64_t) rep->expanded_size));
 
-  err = svn_sqlite__insert(NULL, stmt);
-  if (err)
-    {
-      representation_t *old_rep;
+  SVN_ERR(svn_sqlite__insert(NULL, stmt));
 
-      if (err->apr_err != SVN_ERR_SQLITE_CONSTRAINT)
-        return svn_error_trace(err);
-
-      svn_error_clear(err);
-
-      /* Constraint failed so the mapping for SHA1_CHECKSUM->REP
-         should exist.  If so that's cool -- just do nothing.  If not,
-         that's a red flag!  */
-      SVN_ERR(svn_fs_fs__get_rep_reference(&old_rep, fs, &checksum, pool));
-
-      if (!old_rep)
-        {
-          /* Something really odd at this point, we failed to insert the
-             checksum AND failed to read an existing checksum.  Do we need
-             to flag this? */
-        }
-    }
-
   return SVN_NO_ERROR;
 }
 

Reply via email to