On 12.02.2012 02:52, Stefan Fuhrmann wrote: > The silly part of FSFS is that it does not optimize access > paths, yet, but stores changes individually. The challenge > is our two-dimensional key space and the fact that different > operations traverse the data along different dimensions > (e.g. log ./. checkout).
Interestingly enough, the 2D keyspace isn't that big a problem. The real issue is that we don't even represent all the relevant keys, because of the lazy copying of subtrees. That's what actually prevents us from doing one-shot lookups of arbitrary path@rev; and even then, we'd only really have to do a step-by-step top-down resolve if the initial fast lookup failed. > Question: how many entries would a direct lookup structure > need to have (i.e. path@rev -> data pointer)? Keep in > mind that may valid paths like /branches/foo/bar will never > be mentioned anywhere in a SVN repository because they > never got touched under that name. A rough estimation for > a fully expanded list of entries is > > #nodesInTrunk * #openBranches * #revisions > > This yields 10^9 entries for small repositories and >10^14 > for KDE-sized ones. Clearly impractical. So that's not how you do it. :) You'd only need one reference per actual node, not per possible node lookup paths including revisions. Obviously you have to resolve path@rev to a concrete node before you can do anything with its attributes. With the caveat that there's a nasty edge case involving not-yet-lazy-copied child nodes; but given the way we currently crawl the tree, that's not really an issue. -- Brane