Hi Selcuk, this sounds very interesting. Looking forward to see that solution in action.
In the meantime I hacked a bit on the current JDBM code, I'll attach a patch to Jira. Kind Regards, Stefan On Thu, Aug 18, 2011 at 10:41 PM, Selcuk AYA <[email protected]> wrote: > Hi, > Today we had some discussion with Alex, Emmanuel and others on how we > can improve jdbm consistency semantics. I had spent sometime looking > into this issue and thought it could be useful to put a summary of my > findings here. > > Currently, jdbm has issues with both concurrency and consistency: > 1) jdbm table lookups, insert and remove interfaces are synchronized > methods. So even if all the directory server does is to lookups on > tables, all lookups will be serialized. Moreover, the record manager > operations are all synchronized methods too. This means, for example, > while sync of dirty pages to disk goes on, no lookup operation can go > ahead. > > 2) jdbm browser interface does not provide any consistency guarantees. > If there are underlying changes to the store while the browser is > open, then it might return inconsistent results. I think the situation > is even worse if the underlying record manager is CacheRecordManager > as the same page could be modified and read by a browser concurrently. > > I have been working on a scheme which introduces what can be defined > as action consistency into the jdbm store. > 1) Actions are lookup, insert, remove and browse. Each action is > assigned a unique version. Actions are ReadWrite or ReadOnly. > 2) We allow one ReadWrite action and multiple ReadOnly actions to run > concurrently.So synchronized methods will be removed. > 3)We introduce a new record manager which caches jdbm B+ pages. Each > page in the cache has a [startVersion, endVersion). When an action > with version V1 wants to read a page, its read can be satisfied > satisfied from that page's version with V1 >= startVersion && V1 < > endVersion. > 4) Pages' previous versions are kept in memory. A page can be purged > when the minimum version among all active actions is >= endVersion. > > So say we have three pages in a chain (A0->B0->C0) and each of them > has version range [0, infinity). An write action starts and gets the > version number 1. It updates B0 and C0 to B1 and C1 in any order. > After these two updates, B0 and C0 will have version range [0,1) and > and B1 and C1 will have version range [1,infinity). Before the write > action completes, a read action comes, gets the current read version > which is 0 and reads the chain. Since B0 and C0 will be the versions > that can satisfy this read, the read only action will read the chain > A0->B0->C0. When write action completes, it posts version 1 as the new > read version. First read action completes, a second one starts with > version 1 and that one will read A0->B1->C1. Since the minimum read > version is now 1, B0 and C0 can be zapped. > > Concerns: > 1)Previous versions of B+ tree pages could consume too much memory. As > long as actions are kept small, this is not a problem. Only the > browsing action does not follow this rule. There are a couple of > options to deal with it. We can maybe spill previous versions to disk > after some memory limit. Or we can think about chopping down browsing > action into smaller read actions. Another way to deal with this > problem would be to keep the previous versions of pages on disk rather > than in memory. On disk, versions for a B+ page would form a > chain.This is a radically different way of introducing action > consistency but I thought this unneccesarily complicates free space > management while all we need to do with old versions of pages after a > restart is to toss them away. > > > thanks, > Selcuk > > > > > On Thu, Aug 18, 2011 at 11:48 AM, Emmanuel Lecharny (JIRA) > <[email protected]> wrote: >> >> [ >> https://issues.apache.org/jira/browse/DIRSERVER-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086892#comment-13086892 >> ] >> >> Emmanuel Lecharny commented on DIRSERVER-1642: >> ---------------------------------------------- >> >> Damn... The more I think about the issue, the more I find it critical. >> >> If we remove an entry while someone is doing a search, the search will fail. >> Also we have some problem during replication, just when we try to replicate >> some deletion, and it might prefectly explain why we get those issues. >> >>> Unexpected behaviour in JdbmIndex >>> --------------------------------- >>> >>> Key: DIRSERVER-1642 >>> URL: https://issues.apache.org/jira/browse/DIRSERVER-1642 >>> Project: Directory ApacheDS >>> Issue Type: Bug >>> Reporter: Stefan Seelmann >>> Attachments: IndexTest.java >>> >>> >>> During my experiments and tests of removing one-level and sub-level indices >>> at least one integration test "SearchAuthorizationIT" failed (the test >>> fails recursivelyDelete()). A debugging session showed that the follwing: >>> - in recursivelyDelete() multiple search requests are done which leads to >>> multiple open cursors in the XDBM search engine >>> - an entry is deleted >>> - when the open cursors are advanced wrong/unexpected entries are returned >>> I was able to create a small test that shows the problem: >>> - the index contains six tuples: >>> (a,1) >>> (b,2) >>> (c,3) >>> (d,4) >>> (e,5) >>> (f,6) >>> - a cursor over the index is created and advanced two times, the expected >>> tuples (a,1) and (b,2) were returned >>> - now tuple (c,3) is deleted >>> - when the cursor is advanced again the tuple (b,2) is returned again! I >>> had expected (d,4). >>> Note that this doesn't happen with AvlIndex. >> >> -- >> This message is automatically generated by JIRA. >> For more information on JIRA, see: http://www.atlassian.com/software/jira >> >> >> >
