hi walter, sounds very interesting...
> we just plan to use JackRabbit in an e-learning project with a few > hundred concurrent users. Therefore I am a little concerned about > scalability. > Some figures we forecast for the first expansion stage: > 1.000.000 Nodes > 10.000.000 Properties (around 10 properties/node) > 3.000 Named Users (about 10% concurrent) we just recently ran a test using jackrabbit and cqfs populating roughly 5m items (~500k nodes) and even without using an rdbms back end we did not run into issues. the performance of the persistence layer degraded over time though. > We think of a n-tier architecture with a web and application layer, a > repository layer and the database layer with 2 or more nodes for each > layer. There are either Java and .net applications accessing the content > in the repository, so we are planing to implement a .net client for > JSR170 too. cool. i am currently trying to get at least a common .NET port of the API put together in jackrabbit (just like markus did it for PHP) are you interested in helping with that? i think a .NET client using the WebDAV JCR remoting could be a very interesting option. http://www.day.com/jsr170/server/JCR_Webdav_Protocol.zip > What would be the best deployment model for such a situation in your > opinion? personally, i think that it depends on the nature of the application. the e-learning applications that i know do a lot reading for "course material" and a relatively limited amount of writing operations (test results, user tracking, ...) i think that repository based content replication lends itself to distribute the course material to multiple entirely independent cluster nodes. with respect to the communication protocol of the clients i think depending on the application either an rmi-layer (for java obviously) or a webdav-based client may be a good choice. > Are there any efforts to make jackrabbit clustered for a load sharing > scenario (no session failover at repository layer) ? i think there are a couple of caches that need to be made clusterable (or at least pluggable) in the jackrabbit core for that to happen efficiently, it has to be done very carefully, but it should not be to much work i think. this is definitely on the roadmap and investigations into that direction have already happend. > After reading a lot of code, I think following changes should do it: > - extending ObservationManager to send and receive Events to > and from other nodes maybe... personally i would like to have that functionality closer to the core, to keep things as transactional as possible across the cluster. > - implementing/extending an ORM Layer (Hibernate with shared caching for > performance). The persistence implementation should be aware of the > node types and allow a type specific mapping to tables. So we can map > nodetypes with many instances to own tables while maintaining > flexibility for new "simple" nodetypes. i think that you may get a better performance impact by implementing the shared cache on higher layer in the jackrabbit architecture. on a completely different note, some people probably also like to map nodetypes to tables for "aesthetic" reasons... > - extending LockManager to sync locks with other Nodes > - Lucene should be indepentend on each node but be aware of new nodes > and changes -> Events from ObservationManager true. > - Config - the cluster should have a central place for config management sure. i think that's a nice to have though ;) > - some intelligence in the JCR-RMI client to find a content repository > node from the cluster dependending on node state (load, shutdown, ...) assuming that you are using rmi, yes. if you are using webdav you may be using general http-loadbalancing infrastructure, right? > What else should be synchronized between the nodes? > Did I overlook something? i think this list sounds like a good start... > I am happy about any suggestions even if you dicourage us from using > jackrabbit. Of course we would release some of these developments to the > community - if someone is interested. sure, very interested ;) > I recommend evaluating jackrabbit since I see much future for > the JSR170 standard ... i am glad to hear that. i think going with jsr-170 also allows the customer (at a later date) to even change the implementation if the requirements should change drastically. still protecting all the investments made into the applications, clients, etc... whether someone wants to use an opensource or a commercial jsr-170 compliant content repository remains a question of personal taste and total cost of ownership. > ... but I am concerned about the mentioned scalability issue. i am not too worried about that, i think the metrics that you specified are definitely very doable if one is willing to spend a little bit of time on tweaking ... regards, david
