Re: Request for feedback: adding additional arch support to libssw using the SIMDE headers

Fabian Klötzl Thu, 12 Dec 2019 04:39:25 -0800

Hi Michael,

On 12.12.19 13:05, Michael Crusoe wrote:

Specifically I'm interested in seeing more of our packages for thelatest RaspberryPI systems (arm64).

Likewise I am interested in testing things on arm. However, remotedebugging is not a lot of fun and I am busy writing my dissertation anyways.

Two downsides of using the SIMDE library:
1) Doesn't work with raw assembly, only C/C++ compiler intrinsics(<emmintrin.h> and friends)

I don't see this as a downside. Embedding your intrinsics into theregular source will enable more optimizations for the compiler.

2) Switching between different types of SIMD (like using SSE fallbacksfor an SSE2 operation) is done at compile time and not run time.


This is a bummer, but can be solved (see below).


Questions for you all:
1) Is this a good idea?

I think it is a good idea, iff you have a benchmark proving that theoptimizations will improve the runtimes significantly. For instance,there are a number of different ways to compute the reverse complement.Using a switch statement is very slow, a table is ten times faster, asimd approach can even give another 7x speedup [3].

2) Should we carry these patches if upstream doesn't accept them?


Dunno.

3) Any ideas about compiling with different-m{avx2,avx,sse4.2,sse4.1,ssse3,sse3,sse2,sse,mmx} settings + simplewrapper generation to pick the right executable?

I did that just recently for phylonium [1]. Here is the best approach Ifound: Have each optimized function in a separate file. Compile eachwith its specific -m setting. Further provide a generic implementationas well as one entrypoint function. The latter can then at call timedetermine which optimized implementation to use via__builtin_cpu_supports(). Using ifuncs this can even be delegated todynlink-time.

The devil is in the details: hurd and kfreebsd (and macOS) don't supportifuncs [2]. __builtin_cpu_supports() needs some help to work in ifuncs.Also you have to disable the shenanigans for non-x86/whatever platforms.

I definitely think that a library is the right place for theseoptimizations. (That's one of the reasons I started my libdna project.)If you want to optimize libssw you can try using my approach and see howfar it get's you. ☺


Best
Fabian


1: https://salsa.debian.org/med-team/phylonium/libs/
2: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=945133
3: https://github.com/kloetzl/libdna/blob/master/bench/Brevcomp.cxx

Re: Request for feedback: adding additional arch support to libssw using the SIMDE headers

Reply via email to