It might not be feasible to have a full JCR on top of NoSQL, I don't know yet. However supporting basic search is definitely possible and it should be fast as well. Whether that's synchronous (fully consistent) or asynchronous should be optional. I assume some of the features (e.g. transactions or indexing) should be available in the NoSQL store and the persistence manager should deal with existing interfaces and data layout. However there's a relatively clear solution right now for JCR on top of Hbase and it should have enough features so that someone looking for scalability could use it. I also think global write locks need to go away (at least for NoSQL persistence). This can be taken care at a more granular level inside the actual store.
Cosmin On 8/18/11 3:14 PM, "Bart van der Schans" <[email protected]> wrote: >Hi, > >On Wed, Aug 17, 2011 at 1:51 PM, Jukka Zitting <[email protected]> >wrote: >> Hi, >> >> On Wed, Aug 17, 2011 at 1:37 PM, Cosmin Lehene <[email protected]> >>wrote: >>> First I'll have to better understand what a bundle is :) (JCR newbie >>> here:)). I'll try to read about it. >> >> A bundle is the unit of data stored by a bundle persistence manager. >> It contains the properties and the list of child nodes of a single JCR >> node. >> >> A bundle persistence manager is expected to be able to atomically >> update not just a single bundle at a time, but an arbitrarily large >> ChangeLog of created, updated and deleted bundles. This has so far >> been a big problem for NoSQL-style persistence managers that only >> support locking at the level of individual rows. > >I think this is one of the biggest reasons why JCR 1.0 and 2.0 do not >match "nicely" to most popular NoSQL stores. Imo it's not just a >Jackrabbit issue. The other big problem would be the search. As you >can scale out nicely to huge numbers with some NoSQL stores, the >search will not. This is partly an issue with the Lucene >implementation in Jackrabbit, but also the spec doesn't really "help". >In a big NoSQL deployment you might want to defer the searches to an >external clustered search engine (something solr llike), but that >would/could mean that the search updates lag behind the content. Aka >save first, index later. Another problem could be the current >clustering implementation which requires a global write lock (which is >handled through the database or shared filesystem). Especially in a >multi geolocation deployment a global write lock is not an option.. > >I don't think these issues can be easily "solved" by just implementing >a different persistence manager. It would be interesting to see if we >can come up with some kind of design plan of how JCR could work with a >NoSQL store. Maybe some of that work already started with the >JR3/microkernel prototyping? I could also be that you need to choose >one NoSQL solution and then leverage all the >facilities/services/functionallity provided by the store. So fully use >and exploit something like the Hadoop stack, the Amazon stack or even >the GAE stack. > >We do see more and more people that expect everything to work smoothly >in the cloud and that everything scales nicely and elastically over >multiple datacenters. In the coming years this will become a >requirement and Jackrabbit should be ready for that. > >Bart
