Transactions & DB Persistence Managers

Miro Walker Fri, 08 Dec 2006 10:20:43 -0800

Hi,

We've been doing some thinking about how to get round an issue we're
having that's causing lots of grief and I thought I'd see what you
guys reckon. Please correct me if I'm wrong in my understanding.


The way jackrabbit is currently structured, if versioning is enabled
it is not possible to achieve a truly transactionally secure
repository.

The reason for this is that the version history and the workspace(s)
in which data are stored use seperate database connections. This means
that when performing a versioning operation (e.g. checkIn) two
seperate transactions are executed. If the second one of these fails,
the system will only rollback this one, not the first, leaving the
repository corrupted to an extent that it can't be recovered using the
JCR API. E.g. a node in a workspace contains a reference to a base
version that does not exist in the version history.

This has happened to us on numerous occasions in production, resulting
in many hours worth of lost author time, slipped publication deadlines
and considerable frustration.

There appear to be two ways to resolve this issue (that I can think of):

1. Implement distributed transactions across the two database
connections - this would have a large overhead in terms of performance
and would introduce some complex dependencies on an external
distributed transaction manager. It seems like overkill within a
single application - perhaps useful if one were using JR and an
external DB within the same user transaction, but not for a single
user txn.

2. Allow jackrabbit to use a single database connection/transaction
for a single user transaction.

Without thinking about how JR has been implemented, if there is truly
an accepted use case for a transactional repository as suggested by
the spec, then the latter approach seems natural - it would seem crazy
for any other single persistence application to require multiple
database connections for a single user transaction.

However, the way things stand at the moment, JR was built on the
assumption that people would be using different data sources for
versioning, workspaces, blob storage, etc. This means that each
workspace / version history gets its own (single, shared, but that's
another issue :-)) DB connection. I wonder how valid this assumption
really is? Is anyone out there really doing this with their
applications? I can see the demand for a super-fast non-transactional
persistence mechanism and I obviously know the demand for the other
extreme - what about the mixed model?

How about the use of a DB persistence manager in combination with a DB
FileSystem? Is this a common use case? We are using it because we have
a requirement to have all non-transient data stored in a database (for
backup / DR purposes). Suggestions we've made in the past to make it
easier to administer have been rejected, however, so perhaps it's not
being used much elsewhere?

I'd like to suggest an improvement to rework JR to allow a truly
transactionally safe persistence model, but I thought I'd chuck this
out for consideration first.

Any comments / thoughts?

Miro

Transactions & DB Persistence Managers

Reply via email to