Hi all,

we just plan to use JackRabbit in an e-learning project with a few
hundred concurrent users. Therefore I am a little concerned about
scalability.

Some figures we forecast for the first expansion stage:
 1.000.000 Nodes
10.000.000 Properties (around 10 properties/node)
     3.000 Named Users (about 10% concurrent)

We think of a n-tier architecture with a web and application layer, a
repository layer and the database layer with 2 or more nodes for each
layer. There are either Java and .net applications accessing the content
in the repository, so we are planing to implement a .net client for
JSR170 too.

What would be the best deployment model for such a situation in your
opinion?

Are there any efforts to make jackrabbit clustered for a load sharing
scenario (no session failover at repository layer) ?

After reading a lot of code, I think following changes should do it:

- extending ObservationManager to send and receive Events to
  and from other nodes

- implementing/extending an ORM Layer (Hibernate with shared caching for
  performance). The persistence implementation should be aware of the
  node types and allow a type specific mapping to tables. So we can map
  nodetypes with many instances to own tables while maintaining
  flexibility for new "simple" nodetypes.

- extending LockManager to sync locks with other Nodes

- Lucene should be indepentend on each node but be aware of new nodes
  and changes -> Events from ObservationManager

- Config - the cluster should have a central place for config management

- some intelligence in the JCR-RMI client to find a content repository
  node from the cluster dependending on node state (load, shutdown, ...)

What else should be synchronized between the nodes?
Did I overlook something?

I am happy about any suggestions even if you dicourage us from using
jackrabbit. Of course we would release some of these developments to the
community - if someone is interested.

thx in advance,

cheers
Walter



Reply via email to