2018-04-12 20:07 GMT+02:00 Eric Anholt <e...@anholt.net>:
> Erik Faye-Lund <kusmab...@gmail.com> writes:
>> On Wed, Apr 11, 2018 at 8:48 PM, Thomas Helland
>> <thomashellan...@gmail.com> wrote:
>>> This series came about when I saw a talk online, while simultaneously
>>> being annoyd about the needless waste of memory in our set as reported
>>> by pahole. I have previously made some patches that changed our hash
>>> table from a reprobing one to a quadratic probing one, in the name of
>>> lower overhead and better cache locality, but I was not quite satisfied.
>>> I'm sending this series out now, as it seems like an ideal time since
>>> Timothy is working at reducing our compile times. Further details about
>>> the implementation and its advantages are described in the patches.
>>> I've found this to give a reduction in shader-db runtime of about 2%,
>>> but I have to do some more testing on my main computer, as my laptop
>>> is showing its age with some terrible thermal issues.
>>> This special cases on pointers, as that is a very common usecase.
>>> This allows us to drop some comparisons, and reduce the total size
>>> of our hash table to 70% or our current and the set to 50%. It uses
>>> linear probing and power-of-two table sizes to get good cache locality.
>>> In the pointer_map caes it moves the stored hashes out into it's own
>>> array for even better cache locality.
>>> I'm not sure if we want another set and map amongst our utils,
>>> but the patch series is simple enough, and complete enough,
>>> that I thought I could share it for some inital comments.
>> This approach gives me a bad feeling. Using memory addresses for
>> storage ordering in a compiler is not quite nice; it can easily mask
>> spurious bugs, and have a compiler produce different result each run.
>> Such compilers are not nice to work with. I've seen *exactly* this
>> use-case go wrong in the past.
> I've got bad news for you about what we're already doing in
> _mesa_hash_pointer().
> I'm generally interested in this series, though a completely new
> implementation without unit tests is less interesting to me.  How much
> do we get from just having a pointer map forked off of hash_table.c with
> the ->hash removed?

Yeah, it would obviously need to be accompanied by a bunch of tests.
I'm not even sure that this is the best way. We might, as you suggest,
want to settle on something in between. I should have probably
mentioned that when I run shader-db under "perf stat" the number of
executed instructions is roughly the same, but stalled cycles is reduced.
This lead me to the conclusion that the speedups are mostly
related to better cache locality and possibly also better speculative
code execution on the processor side. I see multiple alternatives;

- Fork of the existing hash table with hash removed
- Change the existing hash table to quadratic probing
  (I believe I tested this recently and concluded there where no gains)
- Change the existing hash table to storing the hash in a separate array
  (This should still give us some of the advantages)
- Separate array for hash + linear probing in existing hash table
- Some other combination of the above

I believe at least some of these alternatives will lead to having to
rewrite some uses of the hash table, as they assume things about
the implementation, although I can't quite remember. I can hack
together some other alternatives to see where that gets us.
Obviously, If we can modify the existing implementation and get
most of the benefit of the other changes that might be preferred,
as we would get all uses in one go, instead of having to migrate.
mesa-dev mailing list

Reply via email to