Marcel Reutegger wrote:
Vadim Gritsenko wrote:

Edgar Poce wrote:

when I decided to write the jdbc pm proposed in jcr-91 I wanted:

1 - a mature, transactional and scalable persistence storage
2 - use rdbms administrative tools, like scheduled backups, etc.
3 - rdbms referential integrity
4 - avoid redundancy. PMs store the NodeReferences twice.
5 - a storage that allows to modify the data easily, just in case.

I need at least 1, 2, and clustering on top of that... None of existing PMs will work in cluster environment (OJB and Hibernate do not count).

Please note that clustering Jackrabbit is not just about the persistence manager. It also involves many other areas that we need to take care of.

I know. But having transactional clustered PM will enable me to create a cluster of Level 1 repository instances to run them on app servers. Next step can be enabling flushing/synchronization of caches on those Level 1 instances. And after all that is done, full clustering (with distributed locking, etc) will be easier to tackle.


See: http://issues.apache.org/jira/browse/JCR-169 for a starting point on discussions about this topic.

Thanks for the pointer.


Why wait release? :-) Isn't code in contrib meant to be grounds for experimental code? :-) Let's bring it up before that - SimpleDB isn't usable as well:

  * Synchronized to death
  * Stored BLOBs locally


Feel free to provide patches to enhance concurrency.

My first patch than will be port of connection pools from Edgar's JDBC PM. Once DB PM has access to DB connection pool, there will be no need for any synchronizations. Would you accept it?


Some enhancements that crossed my mind are:
- use a separate read-only connection for load() and exists() operations
- use a pool of prepared statements for load() and exists()

There are issues with single/double-connection design, beside the fact that (j2ee) applications are discouraged from managing system resources themselves:

  * No transaction isolation - which brings need for synchronizations
  * No keep-alive monitoring
  * No ability to reconnect severed connection

As for statement caching, IIRC driver does this.


With those changes we can then loosen some of the synchronization.

BLOBs are stored locally because many DBs are known for their bad performance when it comes to handling streams. So, speaking of enhancements, introducing a configuration choice for BLOB handling is probably another one.

Locally stored BLOBs might be Ok for non-clustered environment. It might be even Ok in some cluster deployments, if there is a replication mechanism.

But I don't think it is a good idea to replicate full set of BLOBs over each server (multiple times - if server runs more than one webapp) which happen to have a need to access the repository. I prefer having all BLOBs in one place, even if it is a bit slower...

Vadim

Reply via email to