Marcel Reutegger wrote:
Vadim Gritsenko wrote:
Edgar Poce wrote:
when I decided to write the jdbc pm proposed in jcr-91 I wanted:
1 - a mature, transactional and scalable persistence storage
2 - use rdbms administrative tools, like scheduled backups, etc.
3 - rdbms referential integrity
4 - avoid redundancy. PMs store the NodeReferences twice.
5 - a storage that allows to modify the data easily, just in case.
I need at least 1, 2, and clustering on top of that... None of
existing PMs will work in cluster environment (OJB and Hibernate do
not count).
Please note that clustering Jackrabbit is not just about the persistence
manager. It also involves many other areas that we need to take care of.
I know. But having transactional clustered PM will enable me to create a cluster
of Level 1 repository instances to run them on app servers. Next step can be
enabling flushing/synchronization of caches on those Level 1 instances. And
after all that is done, full clustering (with distributed locking, etc) will be
easier to tackle.
See: http://issues.apache.org/jira/browse/JCR-169 for a starting point
on discussions about this topic.
Thanks for the pointer.
Why wait release? :-) Isn't code in contrib meant to be grounds for
experimental code? :-) Let's bring it up before that - SimpleDB isn't
usable as well:
* Synchronized to death
* Stored BLOBs locally
Feel free to provide patches to enhance concurrency.
My first patch than will be port of connection pools from Edgar's JDBC PM. Once
DB PM has access to DB connection pool, there will be no need for any
synchronizations. Would you accept it?
Some enhancements that crossed my mind are:
- use a separate read-only connection for load() and exists() operations
- use a pool of prepared statements for load() and exists()
There are issues with single/double-connection design, beside the fact that
(j2ee) applications are discouraged from managing system resources themselves:
* No transaction isolation - which brings need for synchronizations
* No keep-alive monitoring
* No ability to reconnect severed connection
As for statement caching, IIRC driver does this.
With those changes we can then loosen some of the synchronization.
BLOBs are stored locally because many DBs are known for their bad
performance when it comes to handling streams. So, speaking of
enhancements, introducing a configuration choice for BLOB handling is
probably another one.
Locally stored BLOBs might be Ok for non-clustered environment. It might be even
Ok in some cluster deployments, if there is a replication mechanism.
But I don't think it is a good idea to replicate full set of BLOBs over each
server (multiple times - if server runs more than one webapp) which happen to
have a need to access the repository. I prefer having all BLOBs in one place,
even if it is a bit slower...
Vadim