Marcel Reutegger
Mon, 12 Mar 2007 07:16:40 -0800
Hi Bryan, Bryan Davis wrote:
It is also tempting to try and push this functionality into the database, since it is already doing all of the required locking anyway. A custom transaction manager that delegated to the repository session transaction manager (thereby associating JDBC connections with the repository session), in conjunction with a stock data source implementation (see below) might do the trick. Of course this would only work with database PM¹s, but perhaps other TM¹s could still have the existing SISM locking enabled. This would be good enough for us since we only use database PM¹s, and a better, more universal solution could be implemented at a later date. Has anyone looked into this issue at all or have any advice / thoughts?
well, it's not that easy. locking in SISM is required because there is a cache of item states. the same requirements for a persistence manager also applies to this cache. specifically, after the change log got persisted the cache must be updated atomically.
there were some recent thoughts about implementing a multi-version cache (similar to MVCC in databases), which would support better concurrency. but that requires more effort than just removing some synchronized modifiers in the source code.
Issue #2: Multiple issues in database persistence managers. I believe the database persistence managers have multiple issues (please correct me if I get any of this wrong). 1. JDBC connection details should not be in the repository.xml. I should be free to change the specifics of a particular database connection without it constituting a repository initialization parameter change (which is effectively what changing the repository.xml is, since it gets copied and subsequently accessed from inside the repository itself). If a host name or driver class or even connection URL changes, I should not have to manually edit internal repository configuration files to effect the change.
there's a org.apache.jackrabbit.core.persistence.db.JNDIDatabasePersistenceManager, which exactly does what you need.
2. Sharing JDBC connections (and objects obtained from them, like prepared statements) between multiple threads is not a good practice. Even though many drivers support such activity, it is not specifically required by the spec, and many drivers do not support it. Even for ones that do, there are always a significant list of caveats (like changing the transaction isolation of a connection impacting all threads, or rollbacks sometimes being executed against the wrong thread). Plus, as far as I can tell, there is also no particular good reason to attempt this in this case. Connection pooling is extremely well understood and there are a variety of implementations to choose from (including Apache¹s own in Jakarta Commons).
so far we didn't see any issues with our approach of sharing a connection among threads. we are always happy to accept contributions in that area ;)
3. Synchronization is bad (generally speaking of course :).
one could also say (generally speaking) 'no synchronization' is bad. if you have a multi-threaded program without any synchronization and is super-fast, but just doesn't behave correctly, what good is it then?
Once the multithreaded issues of JDBC are removed (via a connection pool), there are no good reasons that I can see to have any synchronization in the database persistence managers.
that's correct but won't solve your issue, because of JCR-314.btw. you can easily implement your own persistence manager based on the JNDIDatabasePersistenceManager that provides concurrent reads. I think we didn't implement one so far because we didn't see much value in it.
Since any sort of requirement for synchronized operation would be coming from a higher layer, it should also be provided at a higher layer. I have always felt that a good rule of thumb in server code is to avoid synchronization at all costs, particular in core server functionality (like reading and writing to a repository). It if extremely difficult to fully understand the global implications of synchronized code, particular code that is synchronized at a low level. Any serialization of user requests is extremely serious in a multithreaded server and, in my experience, will lead to show-stopping performance and scalability issues in nearly all cases. In addition, serialization of requests at such a low level probably means that other synchronized code that is intended to be properly multithreaded is probably not well tested since the request serialization has eliminated (or greatly reduced) the possibility of the code being reentered like it normally would be.
please note that this serialization of writes currently only takes place at the very last step when changes are stored. all other operations like constraint and node type validation, access right checks, check if the save is self contained, etc. can be done with multiple threads concurrently.
The solution to all of these issues, maybe, is to use the standard JDBC DataSource interface to encapsulate the details of managing the JDBC connections. If all of the current PM and FS implementations that use JDBC were refactored to have a DataSource member and to get and release connections inside of each method, then parity with the existing implementations could be achieved by providing a default DataSourceLookup strategy implemention that simply encapsulated the existing connection creation code (ignoring connection release requests).
I can't follow you here, how would that help?
This would allow us (and others) to externally extend the implementation with alternative DataSourceLookup strategy implementations, say for accessing a datasourcefrom JNDI, or getting it from a Spring application context. This solutionalso neatly externalizes all of the details of the actual datasource configuration from the repository.xml.
using JNDI is already possible and implementing a database persistence manager which obtains its connection from a spring context is fairly easy. you just have to implement the method DatabasePersistenceManager.getConnection() accordingly.
regards marcel