> > We do have an extra call when a duplicate rep is attempted to be > inserted into the rep-cache. When does that happen? I know it will > happen when called from build-repcache; what other use-cases trigger the > codepath you changed? Committing two files with equal contents > sequentially doesn't call svn_fs_fs__set_rep_reference() at all for the > second file. >
I have spent some time examining when it happens and I see the following cases: 1) Commit multiple files with the same content in one revision. 2) During commit, svn_fs_fs__set_rep_reference() is called for a property representation of each node, regardless of whether it is a duplicate or not. 3) Commit a file with the same content as a property representation. > What tests is this behaviour covered by? > I think that: - Case 1) is for example covered by the rep_sharing_effectiveness() test. - Case 2) is implicitly covered by various tests for versioned property changes. - Case 3) is not currently covered by a specific test, but the situation seems to be an edge case. > What scenario does this bring a performance improvement in? What is the > improvement, quantitatively? > I have compared the approaches with ‘FAIL’ and ‘IGNORE’ conflict resolution algorithms using a synthetic benchmark. Inserting 10000 identical entries into a rep-cache.db which contains 250000 entries: 1) In the case of using the ‘FAIL’ algorithm: ~40000 microseconds. 2) In the case of using the ‘IGNORE’ algorithm: ~15000 microseconds. Outside of the benchmark, there is ~15% improvement for the ‘build-repcache’ command. Based on this, I assume that there are other commands that may benefit from this change. I have attached an updated patch. Regards, Denis Kovalchuk
rep-cache.db insert optimization. Use the 'IGNORE' conflict resolution algorithm [1] for STMT_SET_REP and remove SVN_ERR_SQLITE_CONSTRAINT handling, because the error should not occur now. It brings a slight performance improvement, because we remove an extra svn_fs_fs__get_rep_reference() call. The result of a synthetic benchmark, where 10000 identical entries are inserted into a rep-cache.db which contains 250000 entries: 1) In the case of using the 'FAIL' algorithm: ~40000 microseconds. 2) In the case of using the 'IGNORE' algorithm: ~15000 microseconds. [1] https://www.sqlite.org/lang_conflict.html * subversion/libsvn_fs_fs/rep-cache-db.sql (STMT_SET_REP): Use the 'IGNORE' conflict resolution algorithm. * subversion/libsvn_fs_fs/rep-cache.c (svn_fs_fs__set_rep_reference): Remove SVN_ERR_SQLITE_CONSTRAINT handling. Patch by: Denis Kovalchuk <denis.kovalc...@visualsvn.com> Index: subversion/libsvn_fs_fs/rep-cache-db.sql =================================================================== --- subversion/libsvn_fs_fs/rep-cache-db.sql (revision 1875655) +++ subversion/libsvn_fs_fs/rep-cache-db.sql (working copy) @@ -61,7 +61,7 @@ WHERE hash = ?1 -- STMT_SET_REP /* Works for both V1 and V2 schemas. */ -INSERT OR FAIL INTO rep_cache (hash, revision, offset, size, expanded_size) +INSERT OR IGNORE INTO rep_cache (hash, revision, offset, size, expanded_size) VALUES (?1, ?2, ?3, ?4, ?5) -- STMT_GET_REPS_FOR_RANGE Index: subversion/libsvn_fs_fs/rep-cache.c =================================================================== --- subversion/libsvn_fs_fs/rep-cache.c (revision 1875655) +++ subversion/libsvn_fs_fs/rep-cache.c (working copy) @@ -328,7 +328,6 @@ svn_fs_fs__set_rep_reference(svn_fs_t *fs, { fs_fs_data_t *ffd = fs->fsap_data; svn_sqlite__stmt_t *stmt; - svn_error_t *err; svn_checksum_t checksum; checksum.kind = svn_checksum_sha1; checksum.digest = rep->sha1_digest; @@ -351,29 +350,8 @@ svn_fs_fs__set_rep_reference(svn_fs_t *fs, (apr_int64_t) rep->size, (apr_int64_t) rep->expanded_size)); - err = svn_sqlite__insert(NULL, stmt); - if (err) - { - representation_t *old_rep; + SVN_ERR(svn_sqlite__insert(NULL, stmt)); - if (err->apr_err != SVN_ERR_SQLITE_CONSTRAINT) - return svn_error_trace(err); - - svn_error_clear(err); - - /* Constraint failed so the mapping for SHA1_CHECKSUM->REP - should exist. If so that's cool -- just do nothing. If not, - that's a red flag! */ - SVN_ERR(svn_fs_fs__get_rep_reference(&old_rep, fs, &checksum, pool)); - - if (!old_rep) - { - /* Something really odd at this point, we failed to insert the - checksum AND failed to read an existing checksum. Do we need - to flag this? */ - } - } - return SVN_NO_ERROR; }