Re: [389-devel] RFC: New Design: Fine Grained ID List Size

Rich Megginson Fri, 13 Sep 2013 14:41:42 -0700

On 09/13/2013 02:39 PM, David Boreham wrote:

On 9/13/2013 2:18 PM, Rich Megginson wrote:
On 09/12/2013 07:08 PM, David Boreham wrote:
On 9/11/2013 11:41 AM, Howard Chu wrote:
Just out of curiosity, why is keeping a count per key a problem? Ifyou're using BDB duplicate key support, can't you just usecursor->c_count() to get this? I.e., BDB already maintains keycounts internally, why not leverage that?
afaik you need to pass the DB_RECNUM flag at DB creation time to getrecord counting behavior, and it imposes a performance andconcurrency penalty on writes. Also afaik 389DS does not set thatflag except on VLV indexes (which need it, and coincidentally werethe original reason for the feature being added to BDB).
I'm using bdb 4.7 on RHEL 6.
Looking at the code, it appears the dbc->count method for btree is__bamc_count() in bt_cursor.c. I'm not sure, but it looks as thoughthis function has to iterate each page counting the duplicates oneach page, which makes it a non-starter. Unless I'm mistaken, itdoesn't look as though it keeps a counter on each update, then simplyreturns the counter. I don't see any code which would make thebehavior different depending on if DB_RECNUM is used when thedatabase is created.
The DB_RECNUM count feature is not accessed via dbc->count() butthrough the dbc->c_get() call, passing DB_GET_RECNO, positioning atthe last key. You do also need to use nested btrees for it to countthe dups, afaik (but we're doing that in the DS indexes already Ibelieve).


I wrote a small bdbtest.py script which uses the python bdb interface.
https://github.com/richm/scripts/blob/master/bdbtest.py

This creates an env, opens a db withbsddb.db.DB_DUPSORT|bsddb.db.DB_RECNUM, adds several non-dup and duprecords, opens a cursor and iterates them. This is the output:


open dbenv in /var/tmp/dbtest
open db /var/tmp/dbtest/dbtest.db4
no txn records
    key=key0 val=data0
    extra=('', '\x01\x00\x00\x00')
<snip>
    key=key9 val=data9
    extra=('', '\n\x00\x00\x00')
    key=multikey val=multidata0
    extra=('', '\x0b\x00\x00\x00')
<snip>
    key=multikey val=multidata9
    extra=('', '\x0b\x00\x00\x00')

The extra is the str() output of cur.get(bsddb.db.DB_GET_RECNO)

So for all of the dup records, the recno is the same '\b' == 11?

I'm probably missing something, but how do I use this to get the numberof duplicates?





--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-devel


--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-devel

Re: [389-devel] RFC: New Design: Fine Grained ID List Size

Reply via email to