Hi, On Wed, Aug 17, 2011 at 1:51 PM, Jukka Zitting <[email protected]> wrote: > Hi, > > On Wed, Aug 17, 2011 at 1:37 PM, Cosmin Lehene <[email protected]> wrote: >> First I'll have to better understand what a bundle is :) (JCR newbie >> here:)). I'll try to read about it. > > A bundle is the unit of data stored by a bundle persistence manager. > It contains the properties and the list of child nodes of a single JCR > node. > > A bundle persistence manager is expected to be able to atomically > update not just a single bundle at a time, but an arbitrarily large > ChangeLog of created, updated and deleted bundles. This has so far > been a big problem for NoSQL-style persistence managers that only > support locking at the level of individual rows.
I think this is one of the biggest reasons why JCR 1.0 and 2.0 do not match "nicely" to most popular NoSQL stores. Imo it's not just a Jackrabbit issue. The other big problem would be the search. As you can scale out nicely to huge numbers with some NoSQL stores, the search will not. This is partly an issue with the Lucene implementation in Jackrabbit, but also the spec doesn't really "help". In a big NoSQL deployment you might want to defer the searches to an external clustered search engine (something solr llike), but that would/could mean that the search updates lag behind the content. Aka save first, index later. Another problem could be the current clustering implementation which requires a global write lock (which is handled through the database or shared filesystem). Especially in a multi geolocation deployment a global write lock is not an option.. I don't think these issues can be easily "solved" by just implementing a different persistence manager. It would be interesting to see if we can come up with some kind of design plan of how JCR could work with a NoSQL store. Maybe some of that work already started with the JR3/microkernel prototyping? I could also be that you need to choose one NoSQL solution and then leverage all the facilities/services/functionallity provided by the store. So fully use and exploit something like the Hadoop stack, the Amazon stack or even the GAE stack. We do see more and more people that expect everything to work smoothly in the cloud and that everything scales nicely and elastically over multiple datacenters. In the coming years this will become a requirement and Jackrabbit should be ready for that. Bart
