On Fri, Jun 26, 2015 at 6:33 PM, Bruce Guenter <br...@untroubled.org> wrote:
> On Fri, Jun 26, 2015 at 09:58:06AM +0200, jeremy rosen wrote:
> > On Fri, Jun 26, 2015 at 6:57 AM, Bruce Guenter <br...@untroubled.org>
> wrote:
> >
> > > In fact, it could potentially (long term) eliminate the need for
> > > process_cl() as well, with compilers gaining support for offloading
> work
> > > to accelerators. As I understand it, this is a big push for AMD's APUs.
> > >
> > that would be awesome, but at this stage it's still science-fiction, so
> > let's keep an eye on it but not count on it just yet
>
> I was actually mixing up OpenMP and OpenACC. GCC v5 does support OpenACC
> apparently with offloading to CUDA devices. It can offload OpenMP 4 to
> Intel's Xeon Phi, so I don't think there's any technical reason it
> couldn't offload to other accelerators, but that's not here yet.
>
> > > Will it? Or will it only be using the best intrinsics set on the system
> > > it was built, which will of necessity need to be the least common
> > > denominator for binary distributions? That is, do any of the existing
> > > compilers build multiple code paths and choose at runtime based on
> > > processor features? I've not heard of that, but I'd love to be
> > > corrected.
> > >
> > I think it does, but that's something we need to check.
>
> Here's what I've found out: GCC does support function multiversioning
> (https://gcc.gnu.org/wiki/FunctionMultiVersioning) to provide the same
> function name for different architectures and dispatching them at run
> time. The same function marked with __attribute__((target("X"))) is
> indeed compiled differently within the same source for X is SSE2 vs AVX,
> taking advantage of the different instruction sets.
>
Just facts: that is a GCC-specific thing, supported only by gcc-4.8+,
and completely unsupported by clang, not even by clang+llvm 3.7 nightly
snapshot.
I do not want to exaggerate things, but sorry, i do not see how this could
be a solution... :(
However, GCC does *NOT* do any of this automatically, or at least I have
> not found any way to make it. A function (with or without a vectorized
> loop) is emitted exactly once each time it appears, and there are no
> internal branches to different (SSE vs AVX) code paths.
>
> So to take full advantage of it we would still need to repeat the
> function for each target hardware.
Yes, here i definitely agree with you.
Regardless of why it will be done (process() + process_cl() + process_avx()
+ ... OR
process() + process_cl() + process_simd()), that is the ONLY way that would
allow to
eventually implement some kind of automatic implementation verification
tests without
changing much of the code.
In theory this could be done with
> macros, but I don't want to write large functions in a #define.
Ack.
> This
> multiversioning also requires switching the individual source to C++,
> but that looks like a fairly minor issue.
>
Hmm...
So *if* it is to be done, i would do it in a much simpler way, something
like:
https://github.com/darktable-org/darktable/compare/master...LebedevRI:process-reimagined
> --
> Bruce Guenter <br...@untroubled.org> http://untroubled.org/
>
Roman.
>
>
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> darktable-devel mailing list
> darktable-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/darktable-devel
>
>
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
darktable-devel mailing list
darktable-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/darktable-devel