Cosmin, Just an FYI, I have implemented JCR on top of NoSQL using Basho's Riak. I will have to check my code, but the key routines are loadBundle, storeBundle, destroyBundle and the 'refs' routines. I started with DerbyPersistenceManager under ...pool which implements the abstract bundle persistence manager. The cool thing is it does all the serialization/deserialization for you. I have not implemented blobs yet, but Riak has luwak blobs, so I may incorporate. I am working through tests, and have yet to benchmark.
Since Riak has an HTTP API, it opens up all kind of cool possibilities. On Aug 18, 2011, at 8:07 AM, Cosmin Lehene wrote: > It might not be feasible to have a full JCR on top of NoSQL, I don't know > yet. > However supporting basic search is definitely possible and it should be > fast as well. Whether that's synchronous (fully consistent) or > asynchronous should be optional. > I assume some of the features (e.g. transactions or indexing) should be > available in the NoSQL store and the persistence manager should deal with > existing interfaces and data layout. > However there's a relatively clear solution right now for JCR on top of > Hbase and it should have enough features so that someone looking for > scalability could use it. > I also think global write locks need to go away (at least for NoSQL > persistence). This can be taken care at a more granular level inside the > actual store. > > Cosmin > > On 8/18/11 3:14 PM, "Bart van der Schans" <[email protected]> > wrote: > >> Hi, >> >> On Wed, Aug 17, 2011 at 1:51 PM, Jukka Zitting <[email protected]> >> wrote: >>> Hi, >>> >>> On Wed, Aug 17, 2011 at 1:37 PM, Cosmin Lehene <[email protected]> >>> wrote: >>>> First I'll have to better understand what a bundle is :) (JCR newbie >>>> here:)). I'll try to read about it. >>> >>> A bundle is the unit of data stored by a bundle persistence manager. >>> It contains the properties and the list of child nodes of a single JCR >>> node. >>> >>> A bundle persistence manager is expected to be able to atomically >>> update not just a single bundle at a time, but an arbitrarily large >>> ChangeLog of created, updated and deleted bundles. This has so far >>> been a big problem for NoSQL-style persistence managers that only >>> support locking at the level of individual rows. >> >> I think this is one of the biggest reasons why JCR 1.0 and 2.0 do not >> match "nicely" to most popular NoSQL stores. Imo it's not just a >> Jackrabbit issue. The other big problem would be the search. As you >> can scale out nicely to huge numbers with some NoSQL stores, the >> search will not. This is partly an issue with the Lucene >> implementation in Jackrabbit, but also the spec doesn't really "help". >> In a big NoSQL deployment you might want to defer the searches to an >> external clustered search engine (something solr llike), but that >> would/could mean that the search updates lag behind the content. Aka >> save first, index later. Another problem could be the current >> clustering implementation which requires a global write lock (which is >> handled through the database or shared filesystem). Especially in a >> multi geolocation deployment a global write lock is not an option.. >> >> I don't think these issues can be easily "solved" by just implementing >> a different persistence manager. It would be interesting to see if we >> can come up with some kind of design plan of how JCR could work with a >> NoSQL store. Maybe some of that work already started with the >> JR3/microkernel prototyping? I could also be that you need to choose >> one NoSQL solution and then leverage all the >> facilities/services/functionallity provided by the store. So fully use >> and exploit something like the Hadoop stack, the Amazon stack or even >> the GAE stack. >> >> We do see more and more people that expect everything to work smoothly >> in the cloud and that everything scales nicely and elastically over >> multiple datacenters. In the coming years this will become a >> requirement and Jackrabbit should be ready for that. >> >> Bart >
