Hi, We've been doing some thinking about how to get round an issue we're having that's causing lots of grief and I thought I'd see what you guys reckon. Please correct me if I'm wrong in my understanding.
The way jackrabbit is currently structured, if versioning is enabled it is not possible to achieve a truly transactionally secure repository. The reason for this is that the version history and the workspace(s) in which data are stored use seperate database connections. This means that when performing a versioning operation (e.g. checkIn) two seperate transactions are executed. If the second one of these fails, the system will only rollback this one, not the first, leaving the repository corrupted to an extent that it can't be recovered using the JCR API. E.g. a node in a workspace contains a reference to a base version that does not exist in the version history. This has happened to us on numerous occasions in production, resulting in many hours worth of lost author time, slipped publication deadlines and considerable frustration. There appear to be two ways to resolve this issue (that I can think of): 1. Implement distributed transactions across the two database connections - this would have a large overhead in terms of performance and would introduce some complex dependencies on an external distributed transaction manager. It seems like overkill within a single application - perhaps useful if one were using JR and an external DB within the same user transaction, but not for a single user txn. 2. Allow jackrabbit to use a single database connection/transaction for a single user transaction. Without thinking about how JR has been implemented, if there is truly an accepted use case for a transactional repository as suggested by the spec, then the latter approach seems natural - it would seem crazy for any other single persistence application to require multiple database connections for a single user transaction. However, the way things stand at the moment, JR was built on the assumption that people would be using different data sources for versioning, workspaces, blob storage, etc. This means that each workspace / version history gets its own (single, shared, but that's another issue :-)) DB connection. I wonder how valid this assumption really is? Is anyone out there really doing this with their applications? I can see the demand for a super-fast non-transactional persistence mechanism and I obviously know the demand for the other extreme - what about the mixed model? How about the use of a DB persistence manager in combination with a DB FileSystem? Is this a common use case? We are using it because we have a requirement to have all non-transient data stored in a database (for backup / DR purposes). Suggestions we've made in the past to make it easier to administer have been rejected, however, so perhaps it's not being used much elsewhere? I'd like to suggest an improvement to rework JR to allow a truly transactionally safe persistence model, but I thought I'd chuck this out for consideration first. Any comments / thoughts? Miro
