Re: [PATCH] Lightning speed Dict hash table

Alexander Malysh Tue, 07 Aug 2012 06:12:19 -0700

Hi Stipe,

this is really complex patch for the hash issue, but it could be made easier 
and almost the same speed.
I'm not really convenient that we need to apply this complex patch...


Here are my simple tests and patch attached:

        Command used:

time ./test/test_dict 

-- old
2012-08-07 14:47:49 [57291] [0] DEBUG: Dict populate phase.
2012-08-07 14:48:16 [57291] [0] INFO: ok, got 200000 entries in dict1.
2012-08-07 14:48:16 [57291] [0] INFO: ok, got 200000 entries in dict2.
2012-08-07 14:48:16 [57291] [0] DEBUG: Dict lookup phase.

real    0m41.842s
user    0m40.680s
sys     0m1.078s

-- sedgewick (my patch)
2012-08-07 14:47:01 [57016] [0] DEBUG: Dict populate phase.
2012-08-07 14:47:03 [57016] [0] INFO: ok, got 200000 entries in dict1.
2012-08-07 14:47:03 [57016] [0] INFO: ok, got 200000 entries in dict2.
2012-08-07 14:47:03 [57016] [0] DEBUG: Dict lookup phase.

real    0m4.160s
user    0m2.998s
sys     0m0.977s

real    0m3.797s
user    0m2.738s
sys     0m1.015s

real    0m3.732s
user    0m2.767s
sys     0m0.938s

-- Bob Jenkins
2012-08-07 14:50:32 [58470] [0] DEBUG: Dict populate phase.
2012-08-07 14:50:34 [58470] [0] INFO: ok, got 200000 entries in dict1.
2012-08-07 14:50:34 [58470] [0] INFO: ok, got 200000 entries in dict2.
2012-08-07 14:50:34 [58470] [0] DEBUG: Dict lookup phase.

real    0m3.747s
user    0m2.682s
sys     0m0.919s

real    0m3.661s
user    0m2.692s
sys     0m0.946s

real    0m3.522s
user    0m2.606s
sys     0m0.908s

Alex

hash_patch.diff
Description: Binary data

Am 06.08.2012 um 15:59 schrieb Stipe Tolj <st...@kannel.org>:

> Hi list,
> 
> this is a new Dict hash table implementation, which modifies the internal 
> hash algorithm we use in the gwlib/dict.[ch] module.
> 
> The short summary is: It is round about 7,5-8 times faster then the existing 
> implementation!
> 
> Now, the explanation part:
> 
> Running the SVN trunk's test/test_dict on a machine I was benchmarking 
> against gives us the following runtime results:
> 
> Average execution speed: 52,x sec.
> OProfile's opreport for test_dict:
> 
> CPU: CPU with timer interrupt, speed 0 MHz (estimated)
> Profiling through timer interrupt
> samples  %        image name               symbol name
> 19000    39.3954  test_dict                item_has_key
> 17563    36.4158  test_dict                seems_valid_real
> 4528      9.3885  test_dict                gwlist_search
> 3510      7.2778  test_dict                octstr_compare
> 456       0.9455  libc-2.11.3.so           vfprintf
> 251       0.5204  libc-2.11.3.so           _int_malloc
> 175       0.3629  libc-2.11.3.so           random_r
> 159       0.3297  libc-2.11.3.so           __i686.get_pc_thunk.bx
> 150       0.3110  test_dict                octstr_len
> 148       0.3069  test_dict                octstr_get_char
> 139       0.2882  libc-2.11.3.so           random
> 
> (BTW, this was on a VMware VM, so we used timer interrupts, but the relative 
> results on a real machine are the same.)
> 
> So, what does it tell us?
> 
> First of all, most CPU cycles are used for the item_has_key() function 
> internally, which is at the end of the day nothing more then a 
> octstr_compare(). We compare the hash's key Octstr here while searching for 
> the one we're looking for.
> 
> The "real problem" about the current implementation is the hashing of the 
> keys. This happens within key_to_index() and we use this implementation:
> 
> static long key_to_index(Dict *dict, Octstr *key)
> {
>    return octstr_hash_key(key) % dict->size;
> }
> 
> where octstr_hash_key() accumulates the single byte values to one long and we 
> modulo against the dict size.
> 
> That's ok, for small Dicts. But when the Dict size get's bigger and bigger, 
> we'll notice un-leveled congestion in the hash table. Which means for some 
> hash table buckets we'll have larger chained items in the bucket list, then 
> for others.
> 
> This is also where the gwlist_search() hits us from the OProfile results. We 
> spend 10% by simply looking for the right item in the hash bucket list.
> 
> So, the key to performance is having a well distributed hash function, which 
> make the overall average list len of the hash table bucket minimal, and well 
> distributed.
> 
> The new implementation does this. It is based on the code of Bob Jenkins, 
> which was released to the public domain. I have incorporated the parts into 
> our existing gwlib/dict.[ch] structure, and also added some "debugging 
> capabilities" to the structure itself.
> 
> Ok, you want some figures now? Here is the same test run of test_dict on the 
> same machine, under same conditions:
> 
> Average execution speed: 6,7 sec.
> OProfile's opreport for test_dict:
> 
> CPU: CPU with timer interrupt, speed 0 MHz (estimated)
> Profiling through timer interrupt
> samples  %        image name               symbol name
> 568      14.2036  libc-2.11.3.so           vfprintf
> 310       7.7519  libc-2.11.3.so           random
> 273       6.8267  libc-2.11.3.so           random_r
> 193       4.8262  libc-2.11.3.so           __i686.get_pc_thunk.bx
> 191       4.7762  libc-2.11.3.so           _int_malloc
> 185       4.6262  test_dict                lock
> 179       4.4761  libpthread-2.11.3.so     pthread_mutex_lock
> 149       3.7259  test_dict                seems_valid_real
> 121       3.0258  libc-2.11.3.so           _IO_default_xsputn
> 87        2.1755  test_dict                dict_put_normal
> 81        2.0255  [vdso] (tgid:15020 range:0xb7781000-0xb7782000) [vdso] 
> (tgid:15020 range:0xb7781000-0xb7782000)
> 81        2.0255  [vdso] (tgid:15024 range:0xb76fb000-0xb76fc000) [vdso] 
> (tgid:15024 range:0xb76fb000-0xb76fc000)
> 76        1.9005  libc-2.11.3.so           _itoa_word
> 76        1.9005  libc-2.11.3.so           strchrnul
> 75        1.8755  libpthread-2.11.3.so     __pthread_mutex_unlock_usercnt
> 66        1.6504  test_dict                octstr_compare
> 63        1.5754  test_dict                get_random_fd
> 62        1.5504  libc-2.11.3.so           _int_free
> 62        1.5504  libc-2.11.3.so           malloc
> 61        1.5254  test_dict                get_random_bytes
> 59        1.4754  libc-2.11.3.so           free
> 53        1.3253  test_dict                dict_destroy
> 53        1.3253  test_dict                gwlist_search
> 51        1.2753  test_dict                item_has_key
> 50        1.2503  test_dict                gwlist_extract_first
> 
> For those that identify the parts, it speaks for itself.
> 
> Please review and vote for commit.
> 
> BTW, this code is in production for a huge majority of the commercial Kannel 
> SMPP v3.4 server (smppbox) users around the globe and has been proven to be 
> rock stable since introduction.
> 
> Stipe
> 
> -- 
> -------------------------------------------------------------------
> Kölner Landstrasse 419
> 40589 Düsseldorf, NRW, Germany
> 
> tolj.org system architecture      Kannel Software Foundation (KSF)
> http://www.tolj.org/              http://www.kannel.org/
> 
> mailto:st_{at}_tolj.org           mailto:stolj_{at}_kannel.org
> -------------------------------------------------------------------
> <gateway-dict.diff>

Re: [PATCH] Lightning speed Dict hash table

Reply via email to