Am 24.09.2013 13:16, schrieb Tay Ray Chuan:
> Hi Karsten,
> 
> On Tue, Sep 24, 2013 at 5:50 PM, Karsten Blees <karsten.bl...@gmail.com> 
> wrote:
>>
>>         |       add        |  get 100% hits  |    get 10% hits
>>         |  hash  | hashmap | hash  | hashmap |  hash   | hashmap
>> --------+--------+---------+-------+---------+---------+--------
>> FNV     | 14.815 |   2.345 | 3.059 |   1.642 |   4.085 |   0.976
>> FNV  x2 | 14.409 |   2.706 | 2.888 |   1.959 |   3.905 |   1.393
>> i       |  7.432 |   1.593 | 1.364 |   1.142 | 413.023 |   0.589
>> i    x2 |  9.169 |   1.866 | 1.427 |   1.163 |   0.757 |   0.670
>> i/10    |  1.800 |   1.555 | 5.365 |   6.465 |  32.918 |   1.052
>> i/10 x2 |  1.892 |   1.555 | 5.386 |   6.474 |   1.123 |   1.206
>>
>> Tests can be reproduced with 'time echo "perfhash[map] <method> 1000" | 
>> ./test-hashmap', see test-hashmap.c for definition of method flags.
> 
> I'm not sure if I'm reading the numbers right, but they look impressive!
> 

The numbers are for 100 million additions / lookups (1,000 rounds รก 100,000 
entries). Considering everything else that happens in git, the hash table 
performance should be insignificant, though.

> If it's not too much trouble, could you put together an API document,
> along the lines of Documentation/technical/api-hash.txt?

Yes, I had already planned to port the documentation to asciidoc. Although in 
my experience, API documentation in the header file tends to better stay in 
sync with code changes (but this only makes real sense with extraction tools 
such as doxygen).

> I could give
> a stab at replacing patience and histogram diff's hash implementation
> with yours.
> 

Open addressing (i.e. distributing conflicting entries to other buckes) *may* 
be faster *if* all data fits into the table (i.e. no pointers to the data are 
used). Scanning such a table (without following pointers) has very high 
locality and thus may benefit from accessing fewer CPU cache lines. The 
patience implementation seems to fall into this category (although the entry 
struct is fairly large, and it also uses the *2 trick to defeat bad hash codes 
(which wouldn't be necessary with chaining)).

Both patience and histogram use preallocated, fixed-size hash tables, and thus 
won't benefit from faster inserts (the 'add' performance numbers are for 
dynamically resized hash tables).

So, converting patience/histogram is probably not worth the trouble for 
performance reasons alone. If it also simplifies the algorithms and/or reduces 
memory usage - fine.

Ciao,
Karsten
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to