There is also a second way (not counting leaving it as it is),
and i think there are valid arguments why it should be chosen: OpenMP SIMD.
1. more versions of process() - more code to keep synced
Right now we already have process() with SSE[3] and process_cl() with opencl
And even now, there is no checking whether they produce same results...
2. intrinsics are just pain to read and relatively easy to do wrong
(a sample from a minute ago:
https://github.com/klauspost/rawspeed/issues/108)
3. AVX is not the last and fastest set (as in, there will be more), AVX-512
is already planned
I propose to not take this route of adding yet more diversity, but do the
directly opposite thing:
Add process_simd(), which will have absolutely zero intrinsics,
but will exploit OpenMP 4.0 SIMD.
This way, we will have an easy-to-read version of the code, that will be
compilable and will work
on a CPU with any extension set (even ARM probably, not that we care at all
about it) and,
given that the compiler supports OpenMP 4.0, will automatically be using
the best intrinsics set
available on the machine in question.
Doesn't that sound so much better than keeping a separate process() per
instruction set? :)
PS: I already have(had?) a local branch that does just that - adding third
process() instance and calling it
when possible/needed/etc, i have done that after bringing this simd topic
in #darktable some time ago, and
back then, *IIRC* no one was against this, but i could be wrong.
Roman.
On Fri, Jun 26, 2015 at 2:19 AM, johannes hanika <hana...@gmail.com> wrote:
> hi!
>
> sounds exciting :)
>
> 1. maybe near the sse detection in src/common/darktable.c?
>
> 2. no preference from my side. i would probably put it into process() with
> a branch, maybe some modules can make use of the same code in between
> SIMDfied blocks.
>
> 3. it's done with dt_alloc_align, most places use 64 byte alignment, some
> only 16. i think the pixel buffers in the pipeline are 64, so you should be
> fine (but might be safe to put an assertion, just in case).
>
> cheers,
> jo
>
> On Fri, Jun 26, 2015 at 3:05 AM, Bruce Guenter <br...@untroubled.org>
> wrote:
>
>> Hi.
>>
>> I just bought a new laptop that has no GPU, and so no OpenCL support. It
>> does however have a newer CPU with AVX support which could potentially
>> double the performance of some of the algorithms which now use SSE. I
>> tried using Intel's OpenCL-on-CPU package, which uses AVX on the CPU,
>> but it actually ran slower than the existing code for the plugins I
>> tested.
>>
>> So, I would like to start working on adding AVX support, in parallel
>> with the existing SSE code, but I have some questions.
>>
>> 1. Multiple places are likely to need to know if the executing CPU has
>> AVX support. Where should I put the AVX detection code?
>>
>> 2. The same code base will need to work on both systems with and without
>> AVX support. How should I best do this? One option is to add a new
>> process_avx function to IOPs similar to the process_cl one for systems
>> that support it. Another is to switch between within process itself. Any
>> preferences?
>>
>> 3. Is image/array allocation already set up to align to 32-byte
>> boundaries, or only to 16-bytes for SSE?
>>
>> Are there any other issues I should be aware of?
>>
>> Thanks.
>>
>> --
>> Bruce Guenter <br...@untroubled.org>
>> http://untroubled.org/
>>
>>
>> ------------------------------------------------------------------------------
>> Monitor 25 network devices or servers for free with OpManager!
>> OpManager is web-based network management software that monitors
>> network devices and physical & virtual servers, alerts via email & sms
>> for fault. Monitor 25 devices for free with no restriction. Download now
>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
>> _______________________________________________
>> darktable-devel mailing list
>> darktable-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/darktable-devel
>>
>>
>
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> darktable-devel mailing list
> darktable-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/darktable-devel
>
>
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
darktable-devel mailing list
darktable-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/darktable-devel