Re: [jr3] clustering

Michael Dürig Thu, 01 Mar 2012 05:51:21 -0800

Hi Marcel,

Nice to see new ideas wrt. to clustering. I quickly skimmed over the HPpaper and think that approach sounds promising. Having a flexiblepartitioning scheme which also supports dynamic repartioning (i.e.additions of server, migration of nodes) seems a better approach to methan fixed partitioning by path as we discussed earlier.

An open question though is how replication would fit into this picture.There is some mention in the paper about backup nodes for fail-over. Notsure if that is what we are aiming for or whether we want to go beyondthat.

The paper assumes network connections to be reliable (i.e. no messagesaltered, dropped or duplicated). However there is no mention on how thesystem would recover from a partitioned network. That is, how it wouldrecover when some links go down and come up later. However, since ituses 2 phase commit, I think it would basically inherit that behaviourwhich means cluster nodes could become blocked (See [1] proposition 7.1and 7.2).

OTOH the combination of optimistic locking during the transaction itselfand pessimistic locking only for the commit itself will probably resultin very good write throughput. Even more so since probably in many casesthere is only a single node involved in the transaction such that asimple commit suffices.


More comments see inline below.

[1] http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx

On 1.3.12 11:05, Marcel Reutegger wrote:

[...]


so, I was thinking of something similar as described in this
paper [1] or similar [2]. since a B-tree is basically an ordered
list of items we'd have to linearize the JCR or MK hierarchy. I'm
not sure whether a depth or breadth first traversal is
better suited. maybe there even exists a more sophisticated
space filling curve, which is a combination of both. linearizing
the hierarchy on a B-tree should give some since locality for
nodes that are hierarchically close and probability is high that
they are requested in succession.

Node types may give hints here. As long as they are not recursive (i.e.nt:hierarchy) node types usually define "things that belong together".


[...]

Open questions:

how does MVCC fit into this? multiple revisions of the same
JCR/MK node could be stored on a B-tree node. whenever
an update happens the garbage collection could kick in an
purge outdated revisions. providing a consistent journal across
all servers is not clear to me right now.

I think MVCC is not a problem as such. To the contrary, since it isappend only it should even be less problematic. IMO garbage collectionis an entirely different story and we shouldn't worry too much about ituntil we have a good working model for clustering itself.

Wrt. the journal: isn't that just the list of versions of the root node?This should be for free then. But I think I'm missing something here...


How does backup work? this is quite tricky because it is
difficult to get a consistent snapshot of the distributed
tree.

MVCC should make that easy: just make a backup of the head revision atthat time.


Michael


Regards
  Marcel


[0] 
https://docs.google.com/presentation/pub?id=131sVx5s58jAKE2FSVBfUZVQSl1W820_syyzLYRHGH6E&start=false&loop=false&delayms=3000#slide=id.g4272a65_0_39
[1] http://www.hpl.hp.com/techreports/2007/HPL-2007-193.pdf
[2] 
http://research.microsoft.com/en-us/people/aguilera/distributed-btree-vldb2008.pdf

Re: [jr3] clustering

Reply via email to