FSFS replication and the rep cache

Julian Foad Tue, 21 Aug 2018 06:12:48 -0700

Hello Philip, Stefan, and other devs.

Doug Robinson of WANdisco asked for my assistance with their particular use of 
FSFS.


WANdisco intercepts calls into the FSFS API in order to replicate a commit from 
an originating repository to other repositories. The procedure involves 
(omitting many details):

1. Subversion builds a commit txn in the usual way on the originating repo.

2. Subversion calls svn_fs_commit_txn(), which is intercepted:

  2a. The on-disk txn data is copied to the other repository.
  2b. The interception layer makes the txn->commit() call into FSFS on both 
repositories.

It is considered necessary to achieve bit-for-bit identical contents of the 
revisions in each repository. The rep-cache contents, on the other hand, can 
potentially vary on each repository because the updates can potentially fail 
without failing the commit, unless we do something further to ensure they too 
remain synchronized (which is a possibility but not otherwise necessary).

So let me describe the problem.

We can consider the FSFS rep-cache processing in three parts:

1. While building a txn, FSFS looks up each new file text rep in the rep-cache 
and, if found, avoids adding a duplicate copy of it to the txn.
2. During commit-txn, in write_final_rev(), FSFS (>= 1.8) looks up each new 
properties rep in the rep-cache and, if found, avoids adding a duplicate copy 
of it to the txn.
3. During commit-txn, in svn_fs_fs__commit(), FSFS writes new entries to the 
rep-cache.

Part 1 is fine because the txn is built on the originating node, with duplicate 
reps omitted or included as determined by that repository's rep-cache, and will 
be valid on all repositories regardless of the contents of their local 
rep-caches.

Part 2 is the problem. The commit is performed separately on each repository, 
and the result is influenced by the each repository's local rep-cache, but the 
rep-caches are not guaranteed to have identical content.

We should also review the reasons why bit-for-bit identical revisions are 
needed. Before FSFS f7 it was necessary that the byte offsets in all preceding 
revisions were identical across repositories so that the new revision could be 
replicated without rewriting it. With f7 logical addressing that should no 
longer be necessary, but I have not reviewed in detail. Other reasons include 
ease of checking whether replicated repositories are logically identical and 
ease of repair if one repository suffers corruption. So in principle there is 
the possibility to retract that requirement, but in practice at present it very 
probably needs to be kept.

Potential solutions include:

  * We could ignore the rep cache during commit-txn (in existing API: set 
fs_fs_data.rep_sharing_allowed = FALSE), then make a separate call to update 
the rep cache afterwards.

  * We could change FSFS to allow selectively disabling part 2 (look up props) 
during the 'commit' step while keeping part 3 (update) enabled (split that flag 
into two), and disable part 2 only.

With these first two options, the rep cache would deduplicate only file 
contents, and that is fine. Deduplication of properties is relatively minor.

  * We could change FSFS such that commit-txn no longer depends on the 
rep-cache content, by moving the props deduplication to the txn-building phase.

  * We could ensure the rep cache is synchronized across repositories before 
each commit-txn.

I have not yet estimated the effort for the various options, but at first 
glance splitting the flag into two and turning off property deduplication looks 
simple while the other three options look significantly harder.

For the options that involve FSFS code changes, WANdisco could fork it but 
would prefer to use the master version of FSFS.

Could I please hear your thoughts? How appropriate might it be to make such 
changes in FSFS, if they are potentially beneficial for other users of the FSFS 
API, or other considerations?

-- 
- Julian

FSFS replication and the rep cache

Reply via email to