TL;DR: the repository "instance-id" introduced in FSFS f7 doesn't make any difference to on-disk representation of FSFS; can we please affirm that this will continue to be so.

== What is this instance-id? ==

(A brief summary for those who, like me, didn't know what this is.)

In Subversion 1.9 we introduced a repository instance-id in FSFS f7. It is stored as a second line in the "db/uuid" file. The log message tries to explain why:

  https://svn.apache.org/r1618138

Basically it is to disambiguate some potentially shared data in two svn_fs_t objects opened to repositories that have the same (primary) repository UUID. I am still not clear exactly what shared data it is used for and among which processes that data can be shared.

The log message also mentions some scenarios where having different instance-ids is important (if they have the same primary UUID). Three of these that I would like to mention here are:

  * during "svnadmin hotcopy repo1 repo2"
  * during "svnadmin freeze repo1 (svnadmin freeze repo2 (...))"
  * serving repo1 and repo2 from the same Apache httpd instance
    (in some configurations)

The second email thread linked from that log message contains most of the interesting discussion:

  http://svn.haxx.se/dev/archive-2014-08/0093.shtml

== Why it matters ==

WD's Svn Multisite Plus (MSP) replicates and synchronizes Subversion repositories, using rsync initially, then through their own synchronization software. Until now those replicas are bit-for-bit identical, and consistency checking has included checking that repositories remain bit-for-bit identical.

I'm aware that we don't guarantee a repository will be bitwise predictable (and thus two instances remain bit-for-bit identical) when written to. But it has been, under these conditions, and this has been useful.

Replicas are generally kept on physically separate servers, and served by separate Apache httpd instances. Two replicas are never accessed together by the same process, in normal use. It is unlikely but conceivable that an administrator might encounter one of the scenarios where the instance-id matters.

WANdisco asked me to advise on what to do. It seems the correct thing to do with "instance-id" is to make that field deliberately different on each instance of the repository. In consequence the consistency check will have to be made to ignore differences in the instance-id line in the "db/uuid" file.

== Question ==

WANdisco would like to know that there will not be differences in the repository on-disk data due to differences in instance-id (other than the "db/uuid" itself, of course). I suggest we are talking about the lifetime of FSFS format 7; of course the features of a future format are unknown.

Can I tell them that that is the expectation, and we won't change that situation without a good reason?

- Julian

Reply via email to