On Saturday, 28 December 2013 at 20:20:36 UTC, Benjamin Thaut
wrote:
Because I was always curious to do some hashmap profiling with
real data, I did some more. Here are the results:
My implementation. Power of two size (25% free):
building hashmap: 8 seconds 28880605 ticks
looking entries up: 0 seconds 31685 ticks
My implementation. Prime number size (25% free):
building hashmap: 8 seconds 26802252 ticks
looking entries up: 0 seconds 27719 ticks
My implementation. Prime number size (10% free):
building hashmap: 8 seconds 29323952 ticks
looking entries up: 0 seconds 32768 ticks
My implementation. Prime number size (20% free):
26877474 entries
building hashmap: 8 seconds 28895211 ticks
sum 74128
looking entries up: 0 seconds 33222 ticks
My implementation. Prime number size (50% free):
building hashmap: 8 seconds 27738475 ticks
looking entries up: 0 seconds 25696 ticks
OrderedAA (implementation of Daniel Kozak):
building array: 13 seconds 45663219 ticks
lookup: 0 seconds 236323 ticks
Builtin AA:
building array: 14 seconds 47175035 ticks
lookup: 0 seconds 33692 ticks
You can see, that both my implementation and daniel kozaks
outperform the builtin AA when filling the AA. Both my and
daniel kozaks version preallocate the internal array. Although
daniel kozaks version does one additional allocation per entry
to build the double linked lists.
When not preallocating my implementation still outperforms the
builtin AA.
When looking up data, my implementation outperforms both other
implementations. My implementation is only slightly faster then
the builtin one. Daniel Kozaks version is 8 times slower during
lookup compared to my implementation. I think this is mostly
due to cache misses caused by the linked list storage of the
entries.
It is also interresting to note, that using prime number sizes
for the internal array of the AA improved the lookup
performance slightly in my implementation. A certain portion of
all entries in the internal array are always kept free.
Reducing this amount leads to worse performance during looup,
increasing it, gives better performance. You can gain a few
percent speed if you are willing to waste 50% memory. My
measurements show that 25% free entries seem to be the sweet
spot.
This is cool!
Can you post somewhere your code and data which you use for this
test? I really like to test it on my new AA implementation :)