About operation atomicity

Emmanuel Lecharny Thu, 27 May 2010 08:40:44 -0700

Hi guys,

as we already know, we may have issue in the way we handle modifyoperations in the server (add, modify, delete, moddn). Basically, wejust synchronized the following methods in the AbstractStore class :

- add
- delete
- modify
- move
- rename

That's good enough to protect the server against concurrentmodifications, but the problem is that behind the curtain, they are allbeing processed by a backend implementation, which can be Jdbm, HBase,Avl, Oracle, ... All those guys are dealing with concurrent access intheir own way.

Let's focus on Jdbm atm, as it's our main backend. Every operationaccess many indexes and the master table. Each index is a couple ofJdbmTable (forward and backward), and the Master table itself is also aJdbmTable. A JdbmTable itself is backed by a BTree.


So far, so good.

It become a bit more complex when it comes to protecting all thisagainst concurrent access :

- the upper layer (AbstractStore) does not synchronize the read operations

- nor does the JdbmIndex (add, drop(K,Long), sync and close aresynchronized, lookups and more surprisingly drop(K) aren't)- nor does JdbmTable( close, put, remove and sync methods aresynchronized, get(K) isn't)- but all the bTree methods are synchronized, including the one thatread data.

That means that, whatever we do, we will have a bottleneck on readoperations, and we also have overzealous synchronization as every layerprotect all the modify operations. (regardless, this is not really anissue, as we don't care if we synchronize more than once as soon as weare already guaranteed to be thread safe once we get in the AbstractStore)

Now, the question is how can we release this contention on reads ?Obviously, we have to protect the way we modify data against concurrentmodifications, otherwise the index will be broken quickly, but we wantto allow searches to be done in parallel. That mean we must remove allthe synchronization on reads from the BTree implementation.

If we do that, how do we guarantee that a search based on an index willreturn the correct data, instead of getting an exception because theindex is being reorganized by another thread ?

Considering we keep the current JDBM implementation, I see two ways todo that :- implementing a kind of barrier that let all the reads to be donewithout syncronization, blocking writes when they arrive until all thepending reads to be done, then all the following operations will have towait until the write is done. If the next operations are read, then oncethe write is done, reads can be done in parallel again.- using an optimistic lock system, assuming that a read is successful ifno write occured between the time the read started and the time itfinished. If not, then we redo the read. This can be done using a uniquemarker, which is incremented by any write operation. If we get anyexception during a search, that means we tried to access some data whichis not accessible just because a write is modifying the data structure,then we redo the read.

The second strategy is not 100% safe, as we have no guarantee n case westart a read just after a write has started, and finish the read beforethe write has finished. Let's see what it means on a time framedescription :


scenario 1:
-----------

Thread 1 starts a read at t0
Thread 2 starts a read at t1
Thread 3 starts a write at t3 : it blocks until T0 and T1 are done

At t3, all the new read or writes operation are just blocked, waitingfor Thread 3 to be done.


Thread 4 starts a read at t4 : it blocks waiting for T3 to be done
Thread 5 starts a read at t5 : it blocks waiting for T3 to be done
Thread 1 finishes at t6. Nothing changes.
Thread 6 starts a write at t7 : it blocks waiting for T3 to be done

Thread 2 finishes at t8 : Thread 3 is now unblocked, and can beprocessed. Threads 4, 5 and 6 are still blocked, waiting for T3 to be doneThread 3 finishes at t9 : we can now process the pending threads : T4,T5 and T6

Thread 4 and 5 are reads, they are executed concurrently
Thread 6 is a write, it has to wait for both T4 and T5 to be done
etc...

Here, reads are executed concurrently without synchronization, butblocking the writes, and when a writes come in, everything is blockeduntil the write is done.


Scenario 2 :
------------

when a write starts, it increment a TS which is global (AtomicLong, forinstance)


case 1 :

     W start   W ends
     |         |
     V         V
-----o.........o----
  ^         ^
  |         |
  R starts  R ends

in this case, the read will grab the TS, and when it ends, the TS itgrabs will be different, as a write has started. We can assume that thedata we got is not valid anymore, and we redo a read.


case 2 :

     W start   W ends
     |         |
     V         V
-----o.........o----
  ^              ^
  |              |
  R starts       R ends

same as case 1

case 3 :

     W start   W ends
     |         |
     V         V
-----o.........o----
        ^         ^
        |         |
        R starts  R ends

Here, the TS is equal, so we may have an exception, or an invalidresult. If we get an exception, we can simply restart the read. We haveno way to know if the result is valid, so we assume it is, because wedon't care if iit's not (once the user has got a result, as LDAP doesdirty reads anyway, we are safe)


case 4 :


     W start         W ends
     |               |
     V               V
-----o...............o----
        ^         ^
        |         |
        R starts  R ends

Same as case 3.

--ends of scenarios--

So we have a scenario 1 where we are totally thread safe, but wherereads can be blocked by a write. This is 100% safe, with less contentionthan what we have right now.

Scenario 2 is a bit less safe, but we don't block reads : we just redothem if we get an exception.

Any better solution implies we implement a better underlying system,based on MVCC for instance.

We also have to keep in mind that writes impact more than one table, andthey should be atomic, so we must synchronize writes anyway.

Now, thanks for having read this long brain dump, but I'd like to knowwhat's your opinion about those two strategies, which one should weimplement, assuming I'm not off base or that you don't have a bettersolution to propose ?







--
Regards,
Cordialement,
Emmanuel Lécharny
www.nextury.com

About operation atomicity

Reply via email to