On Mon, Oct 28, 2013 at 10:04 PM, Vicent Martí <tan...@gmail.com> wrote:
> On Mon, Oct 28, 2013 at 8:45 PM, Karsten Blees <karsten.bl...@gmail.com>
> > Regarding performance, khash uses open addressing, which requires more key
> > comparisons (O(1/(1-load_factor))) than chaining (O(1+load_factor)).
> > However, any measurable differences will most likely be dwarfed by IO costs
> > in this particular use case.
> I don't think this is true. If you actually run a couple insertion and
> lookup benchmarks comparing the two implementations, you'll find khash
> to be around ~30% faster for most workloads (venturing a guess from
> past experience). I am obviously not the author of khash, but I've
> found that the theoretical increase in key comparisons is completely
> dwarfed by the benefit of increased cache locality during the probing
> fase. I still haven't found a faster C hash table implementation than
> khash for the general case, that's why I normally use it despite the
> worrisome preprocessor crash-party going on in that header file.
Yes, cache locality is where open addressing shines, however, you
loose that benefit when the keys are pointers (e.g. sha1's).
> > Btw., pack-objects.c::rehash_objects() in patch 03 unnecessarily checks for
> > duplicates. That's probably the reason for the high hashcmp times you found
> > in the first round of the patch series.
> Patch 03 is a refactoring -- the duplicate checking code has been in
> pack-objects.c for years. I am not sure duplicate checking is
> superfluous at all, there are many cases where you could be
> double-inserting objects in a pack-objects run, and you really don't
> want to generate packfiles with dupe objects.
The point is that its in _rehash_. Duplicate checking should be in
add/put. Rehash only rearranges entries that are alread _in_ the map,
and it usually only needs the hash code for that. So pack-objects
implements rehash in terms of a full clear + add-all instead, which is
of course slower than what khash, hashmap etc. would do.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html