Re: Scalability/Clustering

David Nuescheler Thu, 07 Jul 2005 14:43:14 -0700

hi walter,

sounds very interesting...


> we just plan to use JackRabbit in an e-learning project with a few
> hundred concurrent users. Therefore I am a little concerned about
> scalability.
> Some figures we forecast for the first expansion stage:
>  1.000.000 Nodes
> 10.000.000 Properties (around 10 properties/node)
>      3.000 Named Users (about 10% concurrent)
we just recently ran a test using jackrabbit and cqfs
populating roughly 5m items (~500k nodes) and
even without using an rdbms back end we did not
run into issues. the performance of the persistence layer 
degraded over time though.

> We think of a n-tier architecture with a web and application layer, a
> repository layer and the database layer with 2 or more nodes for each
> layer. There are either Java and .net applications accessing the content
> in the repository, so we are planing to implement a .net client for
> JSR170 too.
cool. i am currently trying to get at least a common .NET port
of the API put together in jackrabbit (just like markus did it for PHP)
are you interested in helping with that?
i think a .NET client using the WebDAV JCR remoting could 
be a very interesting option.
http://www.day.com/jsr170/server/JCR_Webdav_Protocol.zip

> What would be the best deployment model for such a situation in your
> opinion?
personally, i think that it depends on the nature of the application.
the e-learning applications that i know do a lot reading for "course 
material" and a relatively limited amount of writing operations 
(test results, user tracking, ...)

i think that repository based content replication lends itself to 
distribute the course material to multiple entirely independent
cluster nodes.

with respect to the communication protocol of the clients 
i think depending on the application either an rmi-layer (for java
obviously) or a webdav-based client may be a good choice.

> Are there any efforts to make jackrabbit clustered for a load sharing
> scenario (no session failover at repository layer) ?
i think there are a couple of caches that need to be made 
clusterable (or at least pluggable) in the jackrabbit core for 
that to happen efficiently, it has to be done very carefully, 
but it should not be to much work i think.

this is definitely on the roadmap and investigations into that
direction have already happend.

> After reading a lot of code, I think following changes should do it:
> - extending ObservationManager to send and receive Events to
>   and from other nodes
maybe... personally i would like to have that functionality closer
to the core, to keep things as transactional as possible across
the cluster.

> - implementing/extending an ORM Layer (Hibernate with shared caching for
>   performance). The persistence implementation should be aware of the
>   node types and allow a type specific mapping to tables. So we can map
>   nodetypes with many instances to own tables while maintaining
>   flexibility for new "simple" nodetypes.
i think that you may get a better performance impact by implementing
the shared cache on higher layer in the jackrabbit architecture.
on a completely different note, some people probably also like to map 
nodetypes to tables for "aesthetic" reasons...

> - extending LockManager to sync locks with other Nodes
> - Lucene should be indepentend on each node but be aware of new nodes
>   and changes -> Events from ObservationManager
true.

> - Config - the cluster should have a central place for config management
sure. i think that's a nice to have though ;)

> - some intelligence in the JCR-RMI client to find a content repository
>   node from the cluster dependending on node state (load, shutdown, ...)
assuming that you are using rmi, yes. 
if you are using webdav you may be using general 
http-loadbalancing infrastructure, right?

> What else should be synchronized between the nodes?
> Did I overlook something?
i think this list sounds like a good start...

> I am happy about any suggestions even if you dicourage us from using
> jackrabbit. Of course we would release some of these developments to the
> community - if someone is interested.
sure, very interested ;)

> I recommend evaluating jackrabbit since I see much future for 
> the JSR170 standard ...
i am glad to hear that. i think going with jsr-170 also allows
the customer (at a later date) to even change the implementation 
if the requirements should change drastically. still protecting 
all the investments made into the applications, clients, etc...
whether someone wants to use an opensource or a commercial
jsr-170 compliant content repository remains a question of 
personal taste and total cost of ownership.

> ... but I am concerned about the mentioned scalability issue.
i am not too worried about that, i think the metrics that you
specified are definitely very doable if one is willing to spend a little
bit of time on tweaking ...

regards,
david

Re: Scalability/Clustering

Reply via email to