Karl Fogel wrote:
> Does the current design involve a new per-wc flag that indicates 
> something about a pristines-(present|absent) mode?
> 
> [...] ideally we would record that fact solely on a 
> per-file basis. [...]

That's a good question to ask, as it's a little complex.

TL;DR:

  - There is consensus to store the *desired behaviour* mode at the WC level.
  - There is no need in principle to store a *current state* indicator
("all present?") at the WC level, but there is a performance overhead
issue to consider and such a state flag is one of the possible ways to
address it, but probably not the best.

I agree it is a Good Thing for the design to rely on per-pristine state
and not add a separate (cached) state flag to duplicate this state
knowledge, and handling all behaviour in a single general case. That is
simpler and more reliable. Such a design is possible in principle.

The current patch (but we can/should change it) adds a "mode" setting at
the WC level which combines two semantics:

  (1) the desired fetching/discarding policy (initially either keep all
pristines or use the current pristines-on-demand behaviour), and also
  (2) one of the modes has the additional meaning of "assume all
present: skip the check and expect to error out if assumption was wrong".

Point (1) is from consensus that we should store, in the WC, the current
"pristines mode" setting, so that in case the WC is portable or used
with different user config settings, we wish to avoid flip-flopping its
pristine store between empty-ish and full. (This mode setting will be
initially set at checkout (or upgrade, etc.), according to config from
the user's config file and/or command-line option.)

I'll explain about point (2) and the alternatives to it.

In practice, the current implementation adds a tree scan with
non-negligible cost. To maintain no overhead until opt-in, it seems we
would need to skip this scan. I discussed other potential optimisations
for it and they are generally difficult and limited, given the overall
shape of this design which wants to populate pristines in bulk before
starting an operation, rather than at the moment of access.

I can see three possible approaches, of which the third looks best.

1. The special "assume all present" semantics in the per-WC mode
setting. It has the disadvantage we both mentioned, basically a less
clean design which could lead to further problems down the road. It
doesn't seem to have any show-stopper problems for a first cut.

2. One alternative would be to change the design such that pristine
fetching is done right at the point of use, rather than fetching
everything we might need before a whole client operation. That would
make it simple to use per-pristine state with no overhead. On the other
hand the client would make authentication callback(s) some time after
starting an operation rather than before the operation begins, and
potential re-connection and re-authentication. Evgeny
and others suggested that would be undesirable, creating a more complex
interaction pattern and network connection pattern for the client to
deal with. I have not tried to evaluate how this might impact clients,
especially GUI-type clients.

3. Another alternative would be to make Subversion detect the "all
pristines present" state on its own, and skip the scans then. There are
two ways to implement that:

  - Augment the current implementation to cache that state in the WC
between runs, separately from the desired mode option; or
  - Switch the order of the DB and file-stat scan phases to do the DB
phase first (low overhead) and skip the file-stat scan phase if the DB
knows all pristines are present.

The second implementation option there has advantages:

  - Simplicity of design: no cached state to maintain.
  - This optimises operations on any all-pristines-present subtree, even
when pristines are missing elsewhere. This might not be common but it's
something. The cached state flag approach either works only for the case
when the whole WC is populated, or would require the additional
complexity of a state flag per subtree.
  - It also optimises out the stats in the dehydrate-only pass after an
operation, for files that are already dehydrated, which may well be common.


Conclusions:

  - This is currently based on my assumption that the overhead
introduced needs to be eliminated until opt-in. We haven't published
measurements to back this up, and should.
  - "switch the scan phases and optimise out the stats" looks worthy of
further investigation, possibly the best thing we can do.
  - A desired mode value that carries an assumption of
all-pristines-present (skip the checks) may be kept in mind as a
fallback if we hit a block with a better solution.


Thoughts?

- Julian

Reply via email to