DERBY-700, how to get a lock across multiple class loaders in the same JVM.

Mike Matrigali Thu, 22 Mar 2007 16:01:57 -0800

I would like to see some solution to DERBY-700 go into the 10.3 release,as it is too easy currently for users to access their same database from2 different classloaders in the same JVM and corrupt their database.


Let me try to explain the problem again.  Derby protects multiple thread

access to databases using in memory locking. Any situation where 2instances of Derby can access the same on disk database but notcoordinate through the same in-memory locking scheme creates a situation

that can lead to log/data corruption.


Derby coordinates access to an ondisk database using a 2 level per db
locking scheme.  The following is high level and probably misses some
detail.

1) first it obtains what I refer to as an OS file lock, this is not ajava defined mechanism. It depends on an OS specific behavior that it

is not possible to delete a file on some OS's if the file is still open
(the most common OS where this is true are I believe all window's
OS's 98, NT, 2000, xp, ...).

A) If lock file does not exist we open it and leave it open -->lock granted

    B) If lock file does exist we try to delete it, if we can then
       we go back to A.  If we can't we assume file is locked and
       give up.

2) The second level is that we use java base file locking.  This only

became available in 1.4.2 and later jvm's. This does standardlocking and automatically takes care of releasing lock if JVM goes

away.

We kept both steps, even though it seems that step 2 is sufficient to
provide backward compatible protection.   So anyone running on a JVM
prior to 1.4.2 on a windows platform would still be protected.  The
pre-derby code base also had to worry about protecting access against
versions of the code that did not have step 2 implemented.

The problem is that in cases where derby is run in 2 class loaders in
the same jvm the step 2 file locking does not work.  It is meant only

to protect threads in different JVMS, so it provides no help inpreventing access from 2 different class loaders in the same JVM. It

turns out that step 1 on windows system solves the locking problem
on windows systems for multiple class loaders also.

So a solution should provide 2 things:
1) prevent access from another classloader in the same JVM

2) not allow false positives. So for instance a standard "lock file"could be used on unix systems, creating it and when one boots check

for existence of the lock file and give up if it exists.  The problem
is that it is very easy to cause a JVM to exit without properly cleaning
the lock file and thus one would get into situation where user may have
to clean lock files by hand.

Here is my current proposal, but I really don't like it -- I am hopingsomeone out there can come up with something better.

o keep the step 1 and step 2 locking as described above. It solvescross JVM locking completely.o create a step 3 locking step, the only purpose is to recognizesituations step 1 and 2 can't.

o use simple file system lock file to implement step 3 locking:

A) on db boot if no lock file exists, create it and put a timestampin the file. This db boot is responsible for updating that timestampevery N seconds, for this we need to come up with a guaranteed executingbackground thread - there are known problems with current backgroundthread, and long checkpoints for instance. On shutdown of db we deletelock file.B) on db boot if lock file exists we open it, and get the timestampand compare it with the current timestamp. If the difference is greater

than N we assume the lock file has been left around incorrectly and we
delete the file and go to A (or just open it and update it with our
current timestamp).  It probably is worth logging this event in derby.log.

One nice thing about the above solution is that I think it also solves
our problem with muliple machines accessing the same disk (as long as
their timestamps are the same or close).  I think we can pick a large N
as this should be an error case (ie. the purpose of N is to catch the

case where a classloader went away without allowing us to clean up - Idon't know classloader stuff to know how likely this is), but it isprobably worth making it

configurable so we could adjust in the field if necessary.

I really don't like forcing Derby to run a job every N seconds. Itcould be hard to explain to users why derby is doing work every N

seconds even when nothing else is happening.  I worked on a different

product a long time ago that required us to maintain our own timer foruse in scheduling waits and users noticed and complained such that we

had to add a backoff mechanism based on the amount of work being done
just so the process would not show up at a steady 1% (or whatever) on
there process status monitors.  Now that timer was a lot shorter than
seconds so it may not be an issue.

Some extensions I considered:

1) come up with a unique ID that is specific to a JVM and can be queriedby any thread in any classloader in the JVM. If we had that then wecould write that value into the step 3 lock file and we know that if we

opened the file and saw a different ID, the lock file was invalid.  If

we did this then we narrow the chance of a false positive but weeliminate the chance to catch cross machine access.

2) Have the opener db log some unique id specific to it in addition tothe timestamp, maybe this is just the first timestamp of the open. Thenit could at least tell the next time if another

class loader had incorrectly started, and throw/log some error so we
could catch this problem.

3) maybe the value of N should be logged in the file, I think it has to
   be if we allow it to be configurable.

DERBY-700, how to get a lock across multiple class loaders in the same JVM.

Reply via email to