Karl Fogel wrote: > Does the current design involve a new per-wc flag that indicates > something about a pristines-(present|absent) mode? > > [...] ideally we would record that fact solely on a > per-file basis. [...]
That's a good question to ask, as it's a little complex. TL;DR: - There is consensus to store the *desired behaviour* mode at the WC level. - There is no need in principle to store a *current state* indicator ("all present?") at the WC level, but there is a performance overhead issue to consider and such a state flag is one of the possible ways to address it, but probably not the best. I agree it is a Good Thing for the design to rely on per-pristine state and not add a separate (cached) state flag to duplicate this state knowledge, and handling all behaviour in a single general case. That is simpler and more reliable. Such a design is possible in principle. The current patch (but we can/should change it) adds a "mode" setting at the WC level which combines two semantics: (1) the desired fetching/discarding policy (initially either keep all pristines or use the current pristines-on-demand behaviour), and also (2) one of the modes has the additional meaning of "assume all present: skip the check and expect to error out if assumption was wrong". Point (1) is from consensus that we should store, in the WC, the current "pristines mode" setting, so that in case the WC is portable or used with different user config settings, we wish to avoid flip-flopping its pristine store between empty-ish and full. (This mode setting will be initially set at checkout (or upgrade, etc.), according to config from the user's config file and/or command-line option.) I'll explain about point (2) and the alternatives to it. In practice, the current implementation adds a tree scan with non-negligible cost. To maintain no overhead until opt-in, it seems we would need to skip this scan. I discussed other potential optimisations for it and they are generally difficult and limited, given the overall shape of this design which wants to populate pristines in bulk before starting an operation, rather than at the moment of access. I can see three possible approaches, of which the third looks best. 1. The special "assume all present" semantics in the per-WC mode setting. It has the disadvantage we both mentioned, basically a less clean design which could lead to further problems down the road. It doesn't seem to have any show-stopper problems for a first cut. 2. One alternative would be to change the design such that pristine fetching is done right at the point of use, rather than fetching everything we might need before a whole client operation. That would make it simple to use per-pristine state with no overhead. On the other hand the client would make authentication callback(s) some time after starting an operation rather than before the operation begins, and potential re-connection and re-authentication. Evgeny and others suggested that would be undesirable, creating a more complex interaction pattern and network connection pattern for the client to deal with. I have not tried to evaluate how this might impact clients, especially GUI-type clients. 3. Another alternative would be to make Subversion detect the "all pristines present" state on its own, and skip the scans then. There are two ways to implement that: - Augment the current implementation to cache that state in the WC between runs, separately from the desired mode option; or - Switch the order of the DB and file-stat scan phases to do the DB phase first (low overhead) and skip the file-stat scan phase if the DB knows all pristines are present. The second implementation option there has advantages: - Simplicity of design: no cached state to maintain. - This optimises operations on any all-pristines-present subtree, even when pristines are missing elsewhere. This might not be common but it's something. The cached state flag approach either works only for the case when the whole WC is populated, or would require the additional complexity of a state flag per subtree. - It also optimises out the stats in the dehydrate-only pass after an operation, for files that are already dehydrated, which may well be common. Conclusions: - This is currently based on my assumption that the overhead introduced needs to be eliminated until opt-in. We haven't published measurements to back this up, and should. - "switch the scan phases and optimise out the stats" looks worthy of further investigation, possibly the best thing we can do. - A desired mode value that carries an assumption of all-pristines-present (skip the checks) may be kept in mind as a fallback if we hit a block with a better solution. Thoughts? - Julian