On 09/13/2013 02:39 PM, David Boreham wrote:
On 9/13/2013 2:18 PM, Rich Megginson wrote:
On 09/12/2013 07:08 PM, David Boreham wrote:
On 9/11/2013 11:41 AM, Howard Chu wrote:

Just out of curiosity, why is keeping a count per key a problem? If you're using BDB duplicate key support, can't you just use cursor->c_count() to get this? I.e., BDB already maintains key counts internally, why not leverage that?


afaik you need to pass the DB_RECNUM flag at DB creation time to get record counting behavior, and it imposes a performance and concurrency penalty on writes. Also afaik 389DS does not set that flag except on VLV indexes (which need it, and coincidentally were the original reason for the feature being added to BDB).

I'm using bdb 4.7 on RHEL 6.
Looking at the code, it appears the dbc->count method for btree is __bamc_count() in bt_cursor.c. I'm not sure, but it looks as though this function has to iterate each page counting the duplicates on each page, which makes it a non-starter. Unless I'm mistaken, it doesn't look as though it keeps a counter on each update, then simply returns the counter. I don't see any code which would make the behavior different depending on if DB_RECNUM is used when the database is created.

The DB_RECNUM count feature is not accessed via dbc->count() but through the dbc->c_get() call, passing DB_GET_RECNO, positioning at the last key. You do also need to use nested btrees for it to count the dups, afaik (but we're doing that in the DS indexes already I believe).

I wrote a small bdbtest.py script which uses the python bdb interface.
https://github.com/richm/scripts/blob/master/bdbtest.py

This creates an env, opens a db with bsddb.db.DB_DUPSORT|bsddb.db.DB_RECNUM, adds several non-dup and dup records, opens a cursor and iterates them. This is the output:

open dbenv in /var/tmp/dbtest
open db /var/tmp/dbtest/dbtest.db4
no txn records
    key=key0 val=data0
    extra=('', '\x01\x00\x00\x00')
<snip>
    key=key9 val=data9
    extra=('', '\n\x00\x00\x00')
    key=multikey val=multidata0
    extra=('', '\x0b\x00\x00\x00')
<snip>
    key=multikey val=multidata9
    extra=('', '\x0b\x00\x00\x00')

The extra is the str() output of cur.get(bsddb.db.DB_GET_RECNO)

So for all of the dup records, the recno is the same '\b' == 11?

I'm probably missing something, but how do I use this to get the number of duplicates?




--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-devel

--
389-devel mailing list
389-devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-devel

Reply via email to