Hi all, here are some points i'd like to discuss before the sprint concerning camp. It is both my view of the current situation of camp and camp-theory (especially camp-coq) and a call to see what we're going to do during the sprint. There we go.
A/ camp mailing-list versus darcs-users mailing list. I discovered yesterday the existence of camp's own mailing list, which sees very few usage. darcs-user's traffic is also not that big, and given that it would be nice for everyone here to know in what shape the future darcs-3 is. I propose to use darcs-user for both camp and darcs for normal discussion, and respectively the camp and darcs-devel lists for automated traffic. Is it ok with everybody? Side question: does camp want a space on darcs-wiki, what'd we do with it? B/ camp as darcs-3 / hashed-storage This leads us into the second point: do we have an estimation for the maturity of camp-core (and additionally hashed-storage)? Is it the right time to marry them? Do we have a plan for the various transitions? When do we start coding new features for camp rather than darcs? I don't have an opinion on these questions, but I think the crucial points to decide are: -does camp become darcs-3, or do camp-ideas get ported to darcs'(s?) code? -does camp adopt hashed-storage? is it a goal for the sprint? -if both answers are yes, when do we switch to camp? Do we plan a darcs-2.5 with darcs-core and hashed-storage? C/ camp-paper and camp-coq. Now for the theory of camp. These are mostly internal to camp, but should be decided soon. The current situation: camp's theory lives in two repositories: -Ian's camp repository http://code.haskell.org/camp/devel/paper/, which contains both the latex paper with the theory, and a (very partial) coq formalisation. -My repository http://patch-tag.com/publicrepos/dic (where dic stands for darcs-in-coq), which contains a (very partial) formalisation of Ian's theory, but diverges with it in several points (for some of which, in my understanding, camp's implementation diverges too from the paper). This situation, with 4 different sources of "truth" about camp must be resolved, preferably before the end of the sprint. For the sake of clarity, camp-code refers to camp's implementation in haskell, camp-paper to the theory in Ian's paper, camp-coq to Ian's coq formalization, and dic to mine. Note that the situation is not that desperate, since camp-coq and dic are on the way to converging, but their fusion entails a couple of decisions, which are going to affect camp-paper, as we put it back into sync. I feel that camp-code should not be (yet) affected by these theory changes. Here is my plan for the fusion of dic and camp-coq, Ian, i want your opinion on that. 1/ coqdoc integration This is a must for the final result, which we'll inherit from camp-coq. 2/ modules organisation dic and camp-coq seem to agree that modules are the way to go. Inherit from both. 3/ sensible sequences versus states In darcs-theory as well as in camp-theory, not every sequence of patches is acceptable, since not every patch applies in every situation (Not that the word "context" has another meaning in camp-paper). On that point, camp-paper takes the view that acceptable sequences are defined by a bunch of axioms, and are called "sensible sequences". Sensible sequences are thus a subset of lists of patches defined by a bunch of axioms. Note that, according to Ian, a sensible sequence goes from the empty repostate to somewhere (it is "grounded") In camp-code and dic (and also in darcs), the view is that patches have a from and a to states, and that sequences are encoded with a FL type that makes sure that these from and to states are legally chained. camp-coq does not seem to care about that issue (ie, at the moment, it does not encode this constraint). My view on this point is: -camp-paper's axiom are incomplete. We need at least one more axioms that says that if sp and sq are sensible sequences, then so is spp^q. Without this axiom, nothing says that p^ undoes p, we only know its behaviour with respect to commutation. I also think that at some point, we'll need to introduce ungrounded sensible sequences, if only as a device in proofs, with the axiom that if sp and ps' are sensible, so is sps'. If we do so, then I think that one can show that there is a set of states such that each patch can be associated with a from and a to state, and every ungrounded sequence is sensible if and only if the from and to states match. -camp-code's (and dic's) stateful patches (where states are an _abstract_ type) are much easier to work with in coq (and in code) than a bunch of axioms, as the states prevent most mistakes. Note that using states makes proving that every sequence you produce is good automatic, whereas you need a proof with each operation when using sensibility. My preferred course of action would be to check the consistency between states and sensible sequences (if only as a check of consistency of camp-code and camp-paper). Then use states everywhere (if only because otherwise, proving camp-code is going to be hell). 4/ Unnamed marked sequences before introducing names (they don't exist yet in dic), dic relies on a concept which is not present in camp-paper, nor in camp-coq. That concept is that of marked sequences (shuffle_mark_[1,2] in dic). The idea is that if s (with p marked) can be shuffled into t (with q marked), then when shuffling s into t, p "lands" on q. (There is also a version with two marks). Additionally, one will need a lossy version, with s (with p marked) can be turned into t (with q marked) if by shuffling patches in s, adding and removing patches, p "lands" on q. This allow to prove some more lemmas, which are going to be useful for named patches manipulation. Notation: in the following, " s!p!s' " means " sps' " with p marked. This concept does not make sense for camp-code, since it's a proposition that says "it's possible to get from here to there". We're not manipulating such propositions in the algorithm, only proving that they hold. I also feel that this kind of properties is the real use behind names: comparing two patches' names is just a way to know whether they come from the same record. That's why I want to give them a bigger role, and make them explicitly appear in the paper. Hence, i suggest take the definitions and lemmas from dic, and add them to camp-paper. This is especially important in the light of point 7. 5/ Zigzag sequences (and their absence) This is a point that's absent in all sources alike, but which we will need to take into account somehow. We can sum it up by saying that the subsequence pp^p is evil, and should probably not appear in a repository (at least, when camp is not being invoked). We need to be able to mark which sequences are good in that respect. I propose to say that a _named_ sequence has a zigzag if the same name appears twice in it. What this means in _unnamed_ terms is that a sequence has a zigzag if it can be decomposed into sps'p's'' and for some p'', the sequence !p''! can be turned both into s!p!s'p's'' and into sps'!p'!s''. A sequence without zigzag is straight. This second formalisation seems a bit more complex when written into ascii, but it really is the kind of reasonning you do on paper when you treat patches with the same name ("this is really the same guy that went from here to there with a serie of commutations"). It is natural once you get through the slight notational noise. Then we need to check the fact that all operations preserve straightness. 6/ names as a function or a component of patches This really is a technical coq point (even moreso than the previous ones). In camp-coq: a: We have axioms about unnamed patches b: From these given a set of names, we can build named patches c: Then we can show (as lemmas) the unnamed patches axioms for named patches In dic: a: We have a set of axioms about patches b: We have additionnal axioms for named patches, so that named patches are a subset of general patches. Among these axioms is the existence of a "name" function that is preserved by commutation. I feel dic's approach is better. First, it allows to prove that a system where names are computed on the fly, whereas camp-coq's approach gets us stuck with stored names Second, Ian's proofs that 'c' holds are difficult, whereas they should be easy (blame coq). I pretend that given his namedPatch datatype, and projection as the name function, showing they respect my axioms of named patches is trivial. Given such proof, i think the most reasonable course of action is to go with my approach. 7/ Where to introduce names? This is the biggest point of divergence between my approach and Ian's. What I propose is quite a radical change, but i feel that: -My approach is more elegant, both for paper and computer proofs. -Given my experience of manipulating functions and equality in coq, it's good practice to try to avoid them. Currently, we (camp-paper) only define the merge _algorithm_ and catches on top of named patches. This means that we have a lot of places where we say "if the names are equal, do this, else do that". This is a very good way to define algorithms, but horrible for when we need to prove them: it does not carry what we gain from knowing that the two patches have the same name or not. The best witness of this is the existence of patch universes. Universe. In the plural! That's a sign that something is not going right. They are not quite convincing on paper (try getting a paper reviewed with "the allowed operations are…"), but on coq, it's going to be horrible to deal with them. The simple thought that i might have to invert those guys makes me shiver. And them we'd have to reprove the axioms a third time for patches in patch universes! Also, if the patch universe appears in the type of patches, then this type becomes dangerously self-referential. Otherwise, we will have a lot of spurious "exists u: universe, … p in u -> q in u -> …" hypotheses. With these hypotheses, everytime you have a new patch in your proof you need to prove that it is in the same universe as p and q. Two solutions: -add the hypothesis that u is in fact "saturated": it is closed under all operations but record. But then you need to saturate your universes to use your theorems :-( -construct u' that contains p, q and your new patch. This is going to get boring really quick, and will make proofs ugly. My proposal is to end the madness. I propose to add an axiom that says that we can decide whether two patches are related, and draw the consequences. That is, given two marked sequences s!p!s' and t!q!t', we get -either !r! and 2 sequences of commutations/patch addition/patch deletions getting us onto s!p!s' and t!q!t' (ie, name p = name q) -or u!p'!v!q'!w (or u!q'!v!p'!w) without zigzag, and two sequences of operations going from u!p'!vq'w to s!p!s' and up'v!q'!w to t!q!t' (ie, name p <> name q). This corresponds to "getting back in time" to where the two (maybe one) patches were first recorded. Then we add the requirement that comparing names does give us that decision. This is something we'll have to do anyway, and it's important to be able to isolate it from the construction of catches and other stuff. It's one additional axiom, but one less hypothesis for every theorem. I also think it's the right solution if we want to prove some weakened version of camp. For example, instead of wanting distributed generation of unique names (which is impossible), we might want cryptographically unfalsifiable name (that is, if an attacker can break cryptography, he can create a patch with the same name as a previous patch). I think that the fact that it works would be provable in my context, but much harder if we go the "traditionnal" road. This is the most important point for us to settle i think (maybe there is another solution to make patch universes workable, but i wouldn't bet much on it). Ian, i expect your opinion «de pied ferme». Well, thanks for reading down here folks Florent. _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
