On 09/16/2016 06:57 AM, Nicolai Hähnle wrote:
> Hi all,
> as the title says. The implementation uses a compute shader to summarize
> data from the query buffers. As long as only one query buffer is in flight
> (the normal case), that compute shader is launched exactly once, on a
> single thread. If multiple buffers were required, then one compute grid is
> launched for each of these buffers, in sequence.
> All of this could be done in much fancier ways using bindless buffers and
> wave-wide computations, but really, the expectation is that most queries
> will be rather simple (though occlusion queries always contain at least 8
> result pairs, so it's not like it would be completely pointless).
> This code also exposes the hilarious lowering of 64-bit integer divides
> in LLVM, since timestamp queries use it. This lowering generates more than
> 2KB of code for a single division, which is excessive even when the division
> *isn't* by a constant. The right place to fix this is in LLVM, and I'm
> already looking into it. For normal queries this is completely irrelevant
> because the code will just be skipped.
Is the division by a constant? If it is, you might want to use
something like what libdivide would generate.
> Please review!
> mesa-dev mailing list
mesa-dev mailing list