On Tue, Oct 29, 2013 at 7:06 PM, Niels Ole Salscheider
<niels_...@salscheider-online.de> wrote:
> Hi Tom,
>
> this has been on my todo list for quite a while.
>
> Your patch looks good to me, but in my experience a block with approximately
> the same size for each dimension gives slightly better performance in many
> cases when compared to one where one dimension is significantly larger.
> Maybe you could initialise the size for each dimension to 1 and multiply them
> by 2 in a round-robin fashion as long as feasible.
>
> Regards,
>
> Ole

Either that, or use a greatest common factor algorithm to determine
the GCF between the maximum workgroup size and the global work size...
 The main thing that stuck out when I was looking at this before was
that in the case that you had a global size that wasn't a power of
two, we might end up with local work group sizes that are smaller than
necessary.

Feel free to borrow from the following if needed or if it's at all
useful (euclid's method implementation):
https://github.com/awatry/libvpx.opencl/blob/master/vp8/common/opencl/vp8_opencl.h#L89

That being said, what Tom's got is a definite improvement, and I
believe that it'll still be an improvement over what we have now.   I
haven't experimented much with the round-robin increasing of dimension
sizes given that the algorithm that I've done most of my GPU work in
has been limited in work group size flexibility.

Regardless of what we end up with, this patch looks good to me.  We
can improve upon it if needed, but it looks good.  Note that it will
have to probably be re-based on top of some of Francisco's recent
work.

--Aaron

> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to