On Fri, Jun 26, 2015 at 7:57 AM, Bruce Guenter <br...@untroubled.org> wrote:
> On Fri, Jun 26, 2015 at 02:50:45AM +0300, Roman Lebedev wrote:
> > There is also a second way (not counting leaving it as it is),
> > and i think there are valid arguments why it should be chosen: OpenMP
> SIMD.
>
> I agree that would be the ideal, but I have several issues below.
>
> I had looked at that and for some reason decided it wasn't a real option
> until GCC 5.0 comes out. I now see GCC 4.9 has support for that, which I
> have on both my development systems. If clang doesn't have it yet, it
> will soon as well (it's completed, but I don't know off hand if it's in
> the latest stable version).
>
> That would, however, limit deployment to only the newest versions of
> most Linux OS (Ubuntu 15.04, Fedora 21, Gentoo only with manual
> unmasking).
>
No, because old process() that has SSE/SSE2/SSE3 intrinsics will still be
there.
>
> > 1. more versions of process() - more code to keep synced Right now we
> > already have process() with SSE[3] and process_cl() with opencl And
> > even now, there is no checking whether they produce same results...
>
> This has bothered me too. I have wished there was some way of just
> running a single IOP for testing, benchmarking, and verification.
>
> > 3. AVX is not the last and fastest set (as in, there will be more),
> > AVX-512 is already planned
>
> That is true.
>
> The flip side is that getting vectorization to work as efficiently as
> hand-tuned code may require extra data copies to reorganize the data (ie
> interleaved RGB/Lab into planar) which may eat some of the gains. Maybe
> compilers are smart enough, but I don't have high hopes for that.
>
> Now, the copies may be needed to take advantage of AVX anyways -- most
> of the SSE code is built on the assumption of being able to put one
> pixel in a register (R G B blank), which frequently requires large
> changes to scale up.
>
> I'll have to do some experimenting to see just how smart GCC is.
>
> > I propose to not take this route of adding yet more diversity, but do
> > the directly opposite thing: Add process_simd(), which will have
> > absolutely zero intrinsics, but will exploit OpenMP 4.0 SIMD.
>
> If this is of benefit, why not make the base process() use it? That is,
> why a third function at all?
>
Last time it was discussed, it was agreed that there is just no compilers
that
support it:
*) only GCC 4.9, which is considered half-broken by many people and
*) GCC 5, which has just been released, and also seems to be relatively
broken,
since it produces very strange "array subscript is above array bounds"
error
when building Release mode.
Right now we support gcc-4.6 as youngest compiler, and if we were to
completely
switch to OpenMP SIMD, that would mean depending on OpenMP 4 - gcc-4.9+,
clang-3.7/3.8
And for couple of a years that is just no-go - they are not mature enough,
and not widespread enough.
Though the idea was to bring this up on some LGM...
> In fact, it could potentially (long term) eliminate the need for
> process_cl() as well, with compilers gaining support for offloading work
> to accelerators. As I understand it, this is a big push for AMD's APUs.
>
Yeah, maybe, maybe not, but as jeremy said, that will still be a
science-fiction for some time.
> > This way, we will have an easy-to-read version of the code, that will
> > be compilable and will work on a CPU with any extension set (even ARM
> > probably, not that we care at all about it) and, given that the
> > compiler supports OpenMP 4.0, will automatically be using the best
> > intrinsics set available on the machine in question.
>
> Will it? Or will it only be using the best intrinsics set on the system
> it was built, which will of necessity need to be the least common
> denominator for binary distributions? That is, do any of the existing
> compilers build multiple code paths and choose at runtime based on
> processor features? I've not heard of that, but I'd love to be
> corrected.
>
Good guestion, no comment so far :/
> --
> Bruce Guenter <br...@untroubled.org> http://untroubled.org/
>
Roman.
>
>
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> darktable-devel mailing list
> darktable-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/darktable-devel
>
>
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
darktable-devel mailing list
darktable-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/darktable-devel