On Fri, Jun 26, 2015 at 09:58:06AM +0200, jeremy rosen wrote: > On Fri, Jun 26, 2015 at 6:57 AM, Bruce Guenter <br...@untroubled.org> wrote: > > > In fact, it could potentially (long term) eliminate the need for > > process_cl() as well, with compilers gaining support for offloading work > > to accelerators. As I understand it, this is a big push for AMD's APUs. > > > that would be awesome, but at this stage it's still science-fiction, so > let's keep an eye on it but not count on it just yet
I was actually mixing up OpenMP and OpenACC. GCC v5 does support OpenACC apparently with offloading to CUDA devices. It can offload OpenMP 4 to Intel's Xeon Phi, so I don't think there's any technical reason it couldn't offload to other accelerators, but that's not here yet. > > Will it? Or will it only be using the best intrinsics set on the system > > it was built, which will of necessity need to be the least common > > denominator for binary distributions? That is, do any of the existing > > compilers build multiple code paths and choose at runtime based on > > processor features? I've not heard of that, but I'd love to be > > corrected. > > > I think it does, but that's something we need to check. Here's what I've found out: GCC does support function multiversioning (https://gcc.gnu.org/wiki/FunctionMultiVersioning) to provide the same function name for different architectures and dispatching them at run time. The same function marked with __attribute__((target("X"))) is indeed compiled differently within the same source for X is SSE2 vs AVX, taking advantage of the different instruction sets. However, GCC does *NOT* do any of this automatically, or at least I have not found any way to make it. A function (with or without a vectorized loop) is emitted exactly once each time it appears, and there are no internal branches to different (SSE vs AVX) code paths. So to take full advantage of it we would still need to repeat the function for each target hardware. In theory this could be done with macros, but I don't want to write large functions in a #define. This multiversioning also requires switching the individual source to C++, but that looks like a fairly minor issue. -- Bruce Guenter <br...@untroubled.org> http://untroubled.org/
signature.asc
Description: Digital signature
------------------------------------------------------------------------------ Monitor 25 network devices or servers for free with OpManager! OpManager is web-based network management software that monitors network devices and physical & virtual servers, alerts via email & sms for fault. Monitor 25 devices for free with no restriction. Download now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________ darktable-devel mailing list darktable-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/darktable-devel