On Wed, Sep 18, 2013 at 1:05 PM, Stefan Fuhrmann < stefan.fuhrm...@wandisco.com> wrote:
> On Fri, Sep 13, 2013 at 12:41 PM, Branko Čibej <br...@wandisco.com> wrote: > >> On 13.09.2013 11:32, Philip Martin wrote: >> > Branko Čibej <br...@wandisco.com> writes: >> > >> >> That said, I still do not understand why a different ID would be needed >> >> before the copy-on-write happens. Is it because the client doesn't have >> >> the full history available? If that's the case, I suggest we explore >> >> this on a case-by-case basis, including determining how the initial >> >> state of a working copy for each case can actually occur. >> > Is the FS going only able to provide the moves that apply between two >> > consecutive revisions? Or is it able to provide the combined moves >> > between arbitrary revisions? >> > >> > One way to provide the moves between arbitrary revisions is to iterate >> > through the intervening revisions accumulating and combining the moves. >> > That has the potential to be slow. >> > >> > Another way to provide the moves between arbitrary revisions is to have >> > an id to path map per revision which allows the FS to find the path >> > associated with a given id. However with lazy-copy this map is harder >> > to implement. >> > >> > There is another aspect to the lazy-copy which is when does the new >> > copy-id get assigned to the lazy children. If we commit >> > >> > move A/f A/g >> > >> > then move does not allocate a new copy-id and A/f has the same copy-id >> > as A/g. I think we intend this to be true if the commit combines a move >> > and a modification to the node. Now commit: >> > >> > copy A B >> > >> > here B gets a new copy-id and lazily copied children of B still have the >> > old copy-id. Now what about this commit: >> > >> > move B/g B/h >> > >> > Does move preserve the copy-id so that B/h is still a reference to A/g? >> >> A move through the copied parent has to be interpreted as a write to the >> subtree, which means that the copy-on-write semantics kick in. The move >> then breaks down into: >> >> make-mutable B/g <-- lazy copy, assigns a new copy-id >> move B/g B/h <-- move semantics, B/h keeps same copy-id >> >> You'll not that "make-mutable" is an implementation detail of the >> top-down DAG FS model, and it already does what I described above. This >> is not some new code we'd have to write to implement moves this way; we >> just have to obey existing rules, i.e., before operating on a path >> within an FS transaction, the path must first be made mutable. In other >> words, the FS implementation already works the way I described, >> regardless of whether the actual operation is "move" or something else. >> >> > If the commit was a combined move/modification then B/h would have to >> > get a new copy-id. >> >> It has to get one in any case, as explained above. Operations that >> affect the state of a node, when performed on a path that contains a >> copied parent, must produce the same result regardless of whether the >> filesystem implements lazy copying or not. For the purpose of this >> model, the node's path is part of its state; although of course the path >> is not in fact a unique property of the node. >> >> But it's not necessary to assign new IDs when the node's state is >> /read/; i.e., we do not need copy-on-read in order for this model to >> work. The only consideration is that all operations must work correctly >> with or without lazy copying. It's OK IMO to let the lazy-copy detail >> leak outside the FS implementation, as long as we do not make it a >> required feature. >> > > Hi Brane, > > As described above, the ID assignment works for FSFS > internals but is insufficient to answer the following question: > > Are pathA@rA and pathB@rB points on the same line of history? > > The more theoretical background can be found in my other post, > but the following example should illustrate the problem: > > rN: cp A B, add C (A has sub-nodes A/1, A/1/a, A/1/b, A/1/c) > rN+1: mv B/1 B/3, modify A/1/c, modify B/3/a > rN+2: mv A/1/c B/2, mv A/1/b C/2 > rN+3: modify B/2, modify C/2 > > Now, the following statements are true ("related" here means > "on the same line of history"): > > (1) A/1/b@N and A/1/b@N+1 have the same IDs and are related > (2) A/1/a@N and B/1/a@N+1 have the same IDs and are unrelated > > (3) A/1/c@N and A/1/c@N+1 have same (node,copy) and are related > (4) B/1/c@N and A/1/c@N+1 have same (node,copy) and are unreleated > (5) B/1/a@N and B/3/a@N+1 have different copyIDs and are related > (6) A/1/a@N and B/3/a@N+1 have different copyIDs and are unrelated > > (7) A@N+1 and B@N+2 have different copyIDs and are unrelated, > A/1/c@N+1 and B/2@N+2 have the same IDs and are related > (8) A@N+1 and B@N+3 have different copyIDs and are unrelated, > A/1/c@N+1 and B/2@N+3 have the different copyIDs and are related > (9) A@N+1 and C@N+2 have different IDs and are unrelated, > A/1/c@N+1 and B/2@N+2 have the same IDs and are related > (10) A@N+1 and C@N+3 have different IDs and are unrelated, > A/1/b@N+1 and C/2@N+3 have the different copyIDs and are related > > (11) A@N and C@N+1 have different nodeIDs and are unrelated > > I.e. only a different nodeID can guarantee a different line of history > but this is a relatively rare case with node and copy IDs matching > being the vast majority. > > For the same line of history as well as different lines of history > > * (nodeID, copyID) may be the same and revIDs *differ or not* > * nodeID may be the same and copyID + revID may *differ or not* > * parents may be *related or not* > > The latter is the most devastating part since it makes a top-down > approach impossible where matching IDs could at least guarantee > relatedness (mismatches would require further investigation). > > My feeling is that the fundamental problem is the combination of > * #DAG nodes total < #paths @ rev > * moves are not local, i.e. obliterate any parent context > > So, do are you still convinced that nodeID and copyID are useful > for the detection of moves? If so, how do they apply to the cases > listed above? > A more concrete example: (/trunk/A/ contains *.c and *.h files) rN: cp /trunk/A /trunk/B, mv /trunk/B/* /trunk/B/b_* modify /trunk/A/*, /trunk/B/b_* rN+1: cp /trunk /branch rN+2: add /branch/include, mv /branch/A/*.h /branch/include/*.h, mv /branch/B/*.h /branch/include/*.h rN+3: modify /branch/A/*.c, /branch/B/*.c, /branch/include/*.h rN+4: add /branch/src, mv /branch/A/*.c /branch/src/*.c, mv /branch/B/*.c /branch/src/*.c, The user wants to update her working copy from /branch@rN+1 to /branch@rN+3. How do you identify the moves? -- Stefan^2.