Re: DP Persistence manager implementation

Marcel Reutegger Thu, 02 Feb 2006 07:15:24 -0800

Miro Walker wrote:

We've been discussing the DB PM implementation, and have a couple of
questions regarding the implementation of this. At the moment, the
Simple DB PM appears to have been implemented using a single connection
with all write operations synchronised on a single object. This would
imply that all writes to the database are single threaded, effectively
making any application using it also run single threaded for write
operations. This appears to have two implications:

this is not quite true. the actual store operation on the persistencemanager is synchronized. however most of the write calls from differentthreads to the JCR api in jackrabbit will not block each other becausethose changes are made in a private transient scope. only the final saveor commit of the transaction is serialized. that's only one part of thewhole write process.

1. Performance - in a multi-user system, having single-threaded writes
to the database will make the JDBC connection a serious bottleneck as
soon as the application comes under load. It also means that any
background processing that needs to iterate over the repository making
changes (and we have a few of those) will effectively bring all other

users to a grinding halt.

this depends very much on the use case. again, all changes that such abackground process does, are first made in a transient scope and othersessions are only affected if at all when the changes are stored in thepersistence manager.while one session stores changes, other sessions are still able to readcertain items, as long as those are available in theLocalItemStateManager. Only when other sessions access item that are notavailable in their LocalItemStateManager they will be blocked until thestore is finished.

2. Transactions - we haven't tested this (as the recent support for
transactions in versioning operations has not been integrated into our
system), but it appears that to if a single connection is being used,
then we can only have a single transaction active at any one time. So,
if each user tries to execute a transaction with multiple write
operations in it, and these transactions are to be propagated through to
the database, then each transaction must complete before the next can
begin. This would either mean we get exceptions if the system attempts
to interleave operations from different transactions or that each
transaction must complete in full before another can begin, further
compounding the performance issue.

the scopes of a JCR transaction and a transaction on the underlyingdatabase that is used by jackrabbit are not the same. A JCR transactionstarts with the first modified item, whereas the transaction of theunderlying database starts with the call to Item.save() orSession.save() or the JTA transaction commit (whatever you prefer ;)).

that basically means JCR transactions can run in parallel for most ofthe time, only the commit phase of the JCR transaction is serialized.

In addition to the implications of using a single synchronised
connection, another issue appears to be that the system will be unable
to recover from a connection failure. For example, if the system were
deployed onto a highly available database cluster, then in the event of
DB instance failure, any open connections will be killed, but can quite
happily be reopened later. Jackrabbit appears to create a connection on
initialisation, and has no way to recover if that connection is killed.

This is certainly an issue with the SimpleDbPersistenceManager. I guessthat's why it is called Simple...

IMO the purpose of the SimpleDbPersistenceManager is mainly embeddeddatabases where a connection failure is highly unlikely, as there is nonetwork in between.

I know that questions around implementing support for connection pooling
on the DB have been raised before and then dismissed as unimportant, but
this appears to me to be pretty fundamental. By using a connection pool
implementation that supports recreating dead connections and supports
providing tying a connection to a transaction context, multiple
transactions could run in parallel, helping throughput and making the
system more reliable.

even if such a persistence manager allows concurrent writes, it is stillthe responsibility of the caller to ensure consistency. in our casethat's the SharedItemStateManager. And that's the place wheretransactions are currently serialized, but only on commit.

If concurrent write performance should become a real issue that's wherewe first have to deal with it.


regards
 marcel

Re: DP Persistence manager implementation

Reply via email to