On Fri, Jun 26, 2015 at 09:58:06AM +0200, jeremy rosen wrote:
> On Fri, Jun 26, 2015 at 6:57 AM, Bruce Guenter <br...@untroubled.org> wrote:
> 
> > In fact, it could potentially (long term) eliminate the need for
> > process_cl() as well, with compilers gaining support for offloading work
> > to accelerators. As I understand it, this is a big push for AMD's APUs.
> >
> that would be awesome, but at this stage it's still science-fiction, so
>  let's keep an eye on it but not count on it just yet

I was actually mixing up OpenMP and OpenACC. GCC v5 does support OpenACC
apparently with offloading to CUDA devices. It can offload OpenMP 4 to
Intel's Xeon Phi, so I don't think there's any technical reason it
couldn't offload to other accelerators, but that's not here yet.

> > Will it? Or will it only be using the best intrinsics set on the system
> > it was built, which will of necessity need to be the least common
> > denominator for binary distributions? That is, do any of the existing
> > compilers build multiple code paths and choose at runtime based on
> > processor features? I've not heard of that, but I'd love to be
> > corrected.
> >
> I think it does, but that's something we need to check.

Here's what I've found out: GCC does support function multiversioning
(https://gcc.gnu.org/wiki/FunctionMultiVersioning) to provide the same
function name for different architectures and dispatching them at run
time.  The same function marked with __attribute__((target("X"))) is
indeed compiled differently within the same source for X is SSE2 vs AVX,
taking advantage of the different instruction sets.

However, GCC does *NOT* do any of this automatically, or at least I have
not found any way to make it. A function (with or without a vectorized
loop) is emitted exactly once each time it appears, and there are no
internal branches to different (SSE vs AVX) code paths.

So to take full advantage of it we would still need to repeat the
function for each target hardware. In theory this could be done with
macros, but I don't want to write large functions in a #define. This
multiversioning also requires switching the individual source to C++,
but that looks like a fairly minor issue.

-- 
Bruce Guenter <br...@untroubled.org>                http://untroubled.org/

Attachment: signature.asc
Description: Digital signature

------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
darktable-devel mailing list
darktable-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/darktable-devel

Reply via email to