how does MVCC fit into this? multiple revisions of the same
JCR/MK node could be stored on a B-tree node. whenever
an update happens the garbage collection could kick in an
purge outdated revisions. providing a consistent journal across
all servers is not clear to me right now.
I think MVCC is not a problem as such. To the contrary, since it is
append only it should even be less problematic. IMO garbage collection
is an entirely different story and we shouldn't worry too much about it
until we have a good working model for clustering itself.
Wrt. the journal: isn't that just the list of versions of the root node?
This should be for free then. But I think I'm missing something here...
the model I have in mind doesn't have root node versions that
correspond to MK revisions. Is this mandated somehow by the MK
API design?
in my model only the nodes that changed get new revisions.
and reading from the tree with a given revision means it
will pick the revision which is less or equal to the given revision.
e.g. if you have a node /a/b/c which was changed three times
in revision 2, 7, and 12 and a client reads at revision 9. the
implementation will return revision 7.
I don't see a need why the parent node needs to be updated
when a child node is added, removed or updated.
Hmm I see. I came up with a similar approach loooong time ago. Even
before the Microkernel. Anyway, I think the Microkernel API does not
mandate root node versions corresponding to revisions. In fact I think
the approach you are proposing will scale better wrt. write contention
on the root node since there is no need for writing a new root node on
every write operation. However, getting a consistent journal across
cluster nodes seems more difficult here as you said.
How does backup work? this is quite tricky because it is
difficult to get a consistent snapshot of the distributed
tree.
MVCC should make that easy: just make a backup of the head revision at
that time.
hmm, I'm not sure that will scale. consider a large repository
where traversing all nodes takes a long time.
I think backup should be supported at a lower level to be
efficient.
Hmm right, that makes sense.
Michael
e.g. something like proposed in [0] 4.9.
regards
marcel
[0] http://cs.ucla.edu/~kohler/class/08w-dsi/aguilera07sinfonia.pdf