Re: LLVM asm with constraints, and 2 operands

kinke via Digitalmars-d-learn Mon, 19 Jul 2021 10:25:53 -0700

On Monday, 19 July 2021 at 16:44:35 UTC, Guillaume Piolat wrote:

On Monday, 19 July 2021 at 10:49:56 UTC, kinke wrote:
This workaround is actually missing the clobber constraint for`%2`, which might be problematic after inlining.
An unrelated other issue with asm/__asm is that it doesn'tfollow consistent VEX encoding compared to normal compileroutput.
    sometimes you might want: paddq x, y
              at other times: vpaddq x, y, z

but rarely both in the same program.
So this can easily nullify any gain obtained with VEXtransition costs (if they are still a thing).

You know that asm is to be avoided whenever possible, butunfortunately, AFAIK intel-intrinsics doesn't fit the usual'don't worry, simply compile all your code with an appropriate-mattr/-mcpu option' recommendation, as it employs runtimedetection of available CPU instructions.

I've just tried another option, but that doesn't play nice withinlining:


```
import core.simd;
import ldc.attributes;

@target("sse2") // use SSE2 for this function
int4 _mm_add_int4(int4 a, int4 b)
{
    return a + b; // perfect: paddd %xmm1, %xmm0
}

int4 wrapper(int4 a, int4 b)
{
    return _mm_add_int4(a, b);
}
```

Compiling with `-O -mtriple=i686-linux-gnu -mcpu=i686` (=> noSSE2 by default) shows that the inlined version inside`wrapper()` is the mega slow one, so the extra instructionsaren't applied transitively unfortunately.

Re: LLVM asm with constraints, and 2 operands

Reply via email to