David Nuescheler wrote:
we just recently ran a test using jackrabbit and cqfs
populating roughly 5m items (~500k nodes) and
even without using an rdbms back end we did not
run into issues. the performance of the persistence layer
degraded over time though.
Don't you mean you got good performance because you were NOT using a
database ? Although I've been a proponent of DB storage, I also know
that there will always be an overhead compared to raw file access. There
are other advantages though (as you've summarized here :
http://www.day.com/site/en/index/products/content-centric_infrastructure/content_repository/crx_faq.html
:) )
Are there any efforts to make jackrabbit clustered for a load sharing
scenario (no session failover at repository layer) ?
i think there are a couple of caches that need to be made
clusterable (or at least pluggable) in the jackrabbit core for
that to happen efficiently, it has to be done very carefully,
but it should not be to much work i think.
this is definitely on the roadmap and investigations into that
direction have already happend.
From what I have seen making the cache implementation pluggeable would
be a good necessary first step. It then becomes possible to use OSCache,
JBossTreeCache or Tangosol Coherence that all handle clustered caches.
- implementing/extending an ORM Layer (Hibernate with shared caching for
performance). The persistence implementation should be aware of the
node types and allow a type specific mapping to tables. So we can map
nodetypes with many instances to own tables while maintaining
flexibility for new "simple" nodetypes.
One quick note about the current ORM implementation. The current
implementation that I've worked on with Jackrabbit can be improved. Feel
free to have a look and contribute ! But what David is saying is true :
for performance, the higher you can cache, the better !
Regards,
Serge Huber.