On 01/20/2016 12:45 PM, Ilia Mirkin wrote:
On Wed, Jan 20, 2016 at 4:44 AM, Tapani Pälli <tapani.pa...@intel.com> wrote:
On 01/20/2016 11:31 AM, Ilia Mirkin wrote:
On Wed, Jan 20, 2016 at 4:22 AM, Tapani Pälli <tapani.pa...@intel.com>
wrote:
On 01/20/2016 11:16 AM, Ilia Mirkin wrote:
On Wed, Jan 20, 2016 at 4:09 AM, Tapani Pälli <tapani.pa...@intel.com>
wrote:
On 01/20/2016 10:26 AM, Ilia Mirkin wrote:
On Tue, Jan 19, 2016 at 6:35 AM, Tapani Pälli <tapani.pa...@intel.com>
wrote:
On 01/19/2016 01:14 PM, Ilia Mirkin wrote:
The data structure is a (memory) heap... there appears to be one in
mesa/main/mm.h. There's also one in nouveau_heap.h which is quite
simple and totally unreliant on nouveau, just happens to be there.
How
hard would it be to integrate something like that?
The trouble with adding slow things is that you forget about them,
and
they're not _that_ slow, but this stuff adds up.
The solution I had in mind is to build a list of empty slots when
allocating
remaptable or while finding slots (keep pushing unused empty slots to
list)
... but if possible I would prefer optimization later. First of all,
this
is
quite exotic path to hit with a real program (last words ... yes
yes).
Secondly, and more importantly, we can apply for certification
sooner,
there
are very few failures left.
I see you pushed this patch without concluding this discussion.
Certification may be something that you (personally, as a company,
whatever) are striving for, but that doesn't mean that you get to
ignore reviewer feedback.
I'm sorry if you have that impression but I'm not ignoring review
feedback.
I agree that the find function is not 'optimal' and have planned how to
optimize it and I'm happy with any changes if someone wants to optimize
and
refactor it instead. However, I've noticed this to be not a bottleneck
and
cold path so because of the schedule I'm asking to do this later.
Perhaps in the end you're actually right, I don't know, but we
certainly didn't agree on anything. I'm inclined to push out a revert
while this is being sorted out.
I'm surprised to see this as such a big deal.
// Tapani
The big deal is pushing the patch before concluding the discussion.
Getting back to the matter at hand, what's the absolute worst case
here? How big does the UniformRemapTable get? How many times can this
function get called?
As example with Intel Haswell we have max as 98304, this is the biggest
size
with HSW.
This function gets called only if the remaptable has 'holes' in it,
meaning
that explicit uniforms locations get scattered in this available space, I
consider this very rare for anyone or some engine to do. It could only
really happen if you use both explicit locations (non continuous
locations)
and implicit locations together.
So... what's the worst case? What would that test look like? How long
would it take to execute?
A shader that has max amount of uniforms possible. They need to be invidual
(not arrays or structs), declare half of them with explicit location so that
they fill every other location (leaving as much holes as possible), this
should be the worst case.
I played around a bunch with this. I couldn't trigger the badness, and
then I realized it's because you have a bug: You need to use "j - i <
entries", otherwise you end up picking the wrong location.
True, thanks for finding this out.
I also hit another issue when trying to create a bad shader, with
something like:
layout (location=0) uniform Foo {
layout (location=0) float a;
vec4 ff[2048];
};
I end up hitting
link_uniforms.cpp:1241: void
link_assign_uniform_locations(gl_shader_program*, unsigned int,
unsigned int): Assertion `prog->UniformRemapTable[element_loc] ==
((gl_uniform_storage *) -1)' failed.
Not sure if that's related to your work or not.
This looks like unrelated bug, we should probably catch this in parser
already (?)
However it dawns on me that I had misunderstood your algorithm. I
thought it was triggering O(n^2) behaviour *for each search*, but it
is not -- it's, in essence, a linear scan over the remap table. There
could, of course, be a bunch of these scans. And you skip the process
if the remap table is complete, so it really has to be a bunch of
holes in the thing.
Of course the size is finite, so at worst you'd have a remap table of
size N with N/2 holes, and you'd end up scanning up to the first hole
N/2 times. Which is roughly N^2/4, which is still 4M iterations for N
= 4096.
Unfortunately putting such a shader together is a bit of a pain, since
all the uniforms have to be used. I still really think you need to
build a heap. Or at least store a "first empty slot" so that you don't
repeat the iteration *every* time.
Yeah, I agree "first empty slot" would be good addition and will already
help a lot in problematic cases.
// Tapani
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev