Hi all,

as the title says. The implementation uses a compute shader to summarize
data from the query buffers. As long as only one query buffer is in flight
(the normal case), that compute shader is launched exactly once, on a
single thread. If multiple buffers were required, then one compute grid is
launched for each of these buffers, in sequence.

All of this could be done in much fancier ways using bindless buffers and
wave-wide computations, but really, the expectation is that most queries
will be rather simple (though occlusion queries always contain at least 8
result pairs, so it's not like it would be completely pointless).

This code also exposes the hilarious lowering of 64-bit integer divides
in LLVM, since timestamp queries use it. This lowering generates more than
2KB of code for a single division, which is excessive even when the division
*isn't* by a constant. The right place to fix this is in LLVM, and I'm
already looking into it. For normal queries this is completely irrelevant
because the code will just be skipped.

Please review!

mesa-dev mailing list

Reply via email to