Hi, > >> What are the consistency assumptions a JCR client should be allowed to > >> make? > >> > >> An approach where temporary inconsistencies are tolerated (i.e. eventual > >> consistency) increases availability and throughput. In such a case > >> do/can/should we tolerate temporary violations of: > >> > >> - Node type constraints? > > > > so far we seem to have only discussed edge cases where node type > > constraints could be violated. I think, they are not too relevant in > > a real life system. I'd be OK to make some compromises in this area. > > With the current Microkernel whether these cases (i.e. write skew) [1] > are edge case or not depends on the degree of write concurrency we > anticipate. If we fully synchronize all writes, these cases wont occur > at all. If OTOH we aim for highly concurrent writes, we will see such > cases possibly more often than we like.
I think most applications that have highly concurrent writes usually distribute the writes across many nodes. e.g. you have lots of users working with the system, but each of them is working with his/her own dataset. I think conflicts are likely (even with low concurrency) when nodes are added and/or removed on the same parent. These kind of conflicts should IMO be resolved efficiently and consistently. As you mentioned on the wiki page, these kind of concurrent changes are usually not incompatible and can be merged. To me the example on the wiki page is a reason to drop support for setPrimaryType() for jr3. The specification says: "10.10.2 Updating a Node's Primary Type A repository /may/ permit the primary type of a node to be changed during its lifetime. Repositories are free to limit the scope of permitted changes both in terms of which nodes may be changed and which changes are allowed." Do we have other examples where we know consistency from a JCR perspective is at risk? > [1] > http://wiki.apache.org/jackrabbit/Transactional%20model%20of%20the%20 > Microkernel%20based%20Jackrabbit%20prototype > > >> - Access control rights? > > > > I don't think any violations are acceptable here. > > Me neither. But again we need to be aware of the write skew issue here: > an ACL implementation must be very careful about its consistency > assumptions or it will eventually fail. > > >> - Lock enforcement? > > > > that's definitively a tough one because it depends on repository > > wide state. > > This is an area where Apache Zookeeper might help out. > > >> - Query index consistency? > > > > I think consistency is a prerequisite here, otherwise it's quite > > difficult to implement the query functionality. I'd rather > > make compromises for availability. eg. terminate a long query > > execution with an exception because the snapshot it was > > working on is not available anymore. > > I was more thinking of the other direction: would it be tolerable to > have the query index not up to date yet? (i.e. after a possibly large > save.) Again, this could either result in incomplete query results, an > exception or the query to be deferred until the index is up to date. > Maybe we could even let the client chose through 'query hints'. I like the query hint idea. alternatively we could also deny access to the most recent revision until the index is updated (possibly asynchronously). this way reads and writes are fast at the cost of consistency. reads would be eventually consistent (once index is updated). regards marcel > Michael > > > > >> - Atomicity of save operations? > > > > how does a temporary violation of atomic saves look like? > > are you thinking of partially visible changes? > > > > regards > > marcel > > > >> - ...? > >> > >> Should we offer alternatives in some of these cases? That is, give the > >> client the ability to choose between consistency and availability. > >> > >> Michael > >> > >> > >> [1] > >> > http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20J > >> ackrabbit%203
