On Wed, Aug 09, 2006 at 08:45:51AM -0700, Steven E. Harris wrote: > Nathaniel Smith <[EMAIL PROTECTED]> writes: > > > To make our life simpler, we always send stuff in such an order that > > we can simply stream it straight to the db and consistency is always > > maintained; that way we can simply hit "commit" at any time. > > So the sender orders items with respect to the dependency graph such > that any referenced items arrive before any items referencing them?
Yes. > > Well, merkle tries or the rsync algorithm > > I've been thinking about this for several days now, and it's probably > fodder for a new thread. > > Can you explain how monotone uses Merkle Tr(e|i)es during > synchronization? Specifically, what are the items or data stream being > summarized by the hash tree? My first guess is the revision graph, > flattened somehow, but feel free to correct and elaborate on that. The merkle trie synchronization algorithm is a set synchronization algorithm -- given a set of items here and a set of items there, each with a unique id, it lets each side figure out what it needs to send so that the other side will end up with everything in either set. We use it 4 times, all in parallel: -- on keys -- on epochs -- on certs -- on revisions The first three of these are genuinely sets; the last one has some topology that in principle could be exploited for more efficient synchronization, but it's not clear how to actually do this in practice. We just treat them as a homogenous set, and then sort everything between the merkle refinement and the actual sending. > > to make transmissions idempotent > > I can see how using hash trees avoids having to retransmit data > already held on both sides, but I take idempotent to mean that > transmitting the data (or doing anything) twice has no distinguishable > effect as opposed to doing it only once. Did you mean "avoid sending > more than we need to" or "don't worry if we send more than we need > to"? Yeah, robustness against accidentally sending things multiple times is indeed nice. I more meant the property that everything is stateless, and you can continue interrupted transfers easily and cheaply (this is like how if you need to do a file transfer over a lousy connection, you use rsync, because if you just keep repeating the command long enough it eventually finishes). Now that you mention it, I'm not actually sure why I called that "idempotent".. But anyway, yeah -- content addressing makes storage easier and reduces the severity of bugs, stateless protocols make everything easier and reduce the occurrence of bugs, the ability to magically pick up where one left off, without any special code paths, definitely reduces user frustration and increases error recovery. > > plus content-addressing > > This one has proved to be a stumper trying to find references on the > Web, trying to avoid having to ask here. My best guess is that you're > referring to using hash values for identity, so that the same content > maps to the same identity value, making it easy to discover equal > content. But the phrase also suggests being able to look something up > with a special key. Can you clarify? That's it exactly. See http://en.wikipedia.org/wiki/Content_addressed_storage for instance (though that is indeed somewhat different than what we do... citeseer might be better than google in this case, if you really want background reading). -- Nathaniel -- "If you can explain how you do something, then you're very very bad at it." -- John Hopfield _______________________________________________ Monotone-devel mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/monotone-devel
