Hi all, Out of curiosity I wrote a script which dumps subversion bdb tables and found interesting anomaly in "strings" table. Every string there has a duplicate with empty value. It is my understanding that "strings" allows duplicates to store very large content in chunks under the same key. That's fine. But why every small string (like file name) has a key duplicate? Looks like a bug to me. This bug does not prevent normal functioning because strings are concatenated when read and empty value does not harm, but from performance point of view, having 2x nodes in btree is not good.
Here is what I'm talking about: =========== nodes ================ k:'0.0.0' v:'((dir 1 / 0 1 0) 0 0 )' k:'0.0.1' v:'((dir 1 / 5 0.0.0 1 1 1 0 1 0) 0 1 0)' k:'1.0.1' v:'((file 9 /test.txt 0 1 0 1 0 1 0) 0 1 1)' k:'next-key' v:'2' =========== strings ================ k:'0' v:'' k:'0' v:'((test.txt 5 1.0.1))' k:'1' v:'' k:'1' v:'aaa' k:'next-key' v:'2' =========== revisions ================ k:'1' v:'(revision 1 0)' k:'2' v:'(revision 1 1)' Pay attention to "strings" key. Empty value is repeated for every string. My environment: svn, version 1.6.5 (r38866) Linux ubuntu 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 16:20:31 UTC 2009 i686 GNU/Linux Here is the script: =========================================================== #!/usr/bin/ruby require 'bdb' $env = BDB::Env.open('repo3/db', flags=BDB::INIT_MPOOL, mode=0) def list_content(file, db_type) puts "=========== #{file} ================" db = $env.open_db(db_type, name=file) db.each do |k,v| puts "k:'#{k}' v:'#{v}'" end db.close end # checksum-reps %w(changes copies nodes node-origins miscellaneous representations strings transactions). each{|f| list_content(f, BDB::BTREE) } %w(revisions uuids). each{|f| list_content(f, BDB::RECNO) } =========================================================== -- >From RFC 2631: In ASN.1, EXPLICIT tagging is implicit unless IMPLICIT is explicitly specified