On Fri, Jun 26, 2015 at 6:57 AM, Bruce Guenter <br...@untroubled.org> wrote:

> On Fri, Jun 26, 2015 at 02:50:45AM +0300, Roman Lebedev wrote:
> > There is also a second way (not counting leaving it as it is),
> > and i think there are valid arguments why it should be chosen: OpenMP
> SIMD.
>
> I agree that would be the ideal, but I have several issues below.
>
> I had looked at that and for some reason decided it wasn't a real option
> until GCC 5.0 comes out. I now see GCC 4.9 has support for that, which I
> have on both my development systems. If clang doesn't have it yet, it
> will soon as well (it's completed, but I don't know off hand if it's in
> the latest stable version).
>
> The idea is to (temporarly) keep three version of process

* the opencl one
* the SSE one
* the simd one

this way, people with modern compilers (gcc 4.9, clang master IIUC) will
use simd, users with older compiler will still be stuck to x86 but that's
not a loss of a feature and we have a fallback path in case simd is not good

in a couple of years, hopefully, we will drop the SSE path...




> That would, however, limit deployment to only the newest versions of
> most Linux OS (Ubuntu 15.04, Fedora 21, Gentoo only with manual
> unmasking).
>
> > 1. more versions of process() - more code to keep synced Right now we
> > already have process() with SSE[3] and process_cl() with opencl And
> > even now, there is no checking whether they produce same results...
>
> This has bothered me too. I have wished there was some way of just
> running a single IOP for testing, benchmarking, and verification.
>
> > 3. AVX is not the last and fastest set (as in, there will be more),
> > AVX-512 is already planned
>
> That is true.
>
> The flip side is that getting vectorization to work as efficiently as
> hand-tuned code may require extra data copies to reorganize the data (ie
> interleaved RGB/Lab into planar) which may eat some of the gains. Maybe
> compilers are smart enough, but I don't have high hopes for that.
>
> Now, the copies may be needed to take advantage of AVX anyways -- most
> of the SSE code is built on the assumption of being able to put one
> pixel in a register (R G B blank), which frequently requires large
> changes to scale up.
>
> I'll have to do some experimenting to see just how smart GCC is.
>
> > I propose to not take this route of adding yet more diversity, but do
> > the directly opposite thing: Add process_simd(), which will have
> > absolutely zero intrinsics, but will exploit OpenMP 4.0 SIMD.
>
> If this is of benefit, why not make the base process() use it? That is,
> why a third function at all?
>
> backward compatibility for older compilers (see above)


> In fact, it could potentially (long term) eliminate the need for
> process_cl() as well, with compilers gaining support for offloading work
> to accelerators. As I understand it, this is a big push for AMD's APUs.
>
>
that would be awesome, but at this stage it's still science-fiction, so
 let's keep an eye on it but not count on it just yet


> > This way, we will have an easy-to-read version of the code, that will
> > be compilable and will work on a CPU with any extension set (even ARM
> > probably, not that we care at all about it) and, given that the
> > compiler supports OpenMP 4.0, will automatically be using the best
> > intrinsics set available on the machine in question.
>
> Will it? Or will it only be using the best intrinsics set on the system
> it was built, which will of necessity need to be the least common
> denominator for binary distributions? That is, do any of the existing
> compilers build multiple code paths and choose at runtime based on
> processor features? I've not heard of that, but I'd love to be
> corrected.
>
>
I think it does, but that's something we need to check.


> --
> Bruce Guenter <br...@untroubled.org>                http://untroubled.org/
>
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> darktable-devel mailing list
> darktable-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/darktable-devel
>
>
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
darktable-devel mailing list
darktable-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/darktable-devel

Reply via email to