Re: Optionally using more advanced CPU features

Ludovic Courtès Tue, 22 Aug 2017 02:22:11 -0700

Hi,

Ricardo Wurmus <rek...@elephly.net> skribis:


> I was wondering how we should go about optionally building software for
> more advanced CPU features.  Currently, we build software for the lowest
> common feature set among x86_64 CPUs.  That’s good for portability, but
> not so good for performance.
>
> Enabling CPU features often happens through configure flags, but
> expressing support at that level in our package definitions seems bad.
> How can we make it possible for users to build their software for
> different CPUs?

To some extent, I think this is a compiler/OS/upstream issue.  By that I
mean that the best way to achieve use of extra CPU features is by using
the “IFUNC” feature of GNU ld.so, which is what libc does (it has
variants of strcmp etc. tweaked for various CPU extensions like SSE, and
the right one gets picked up at load time.)  Software like GMP, Nettle,
or MPlayer also does this kind of selection at run time, but using
custom mechanisms.

GCC now has a ‘target_clones’ function attribute, which instructs it to
generate several variants of a function and use IFUNC to pick up the
right one (info "(gcc) Common Function Attributes").  Ideally, upstream
would use this.

When upstream does that, we have portable-yet-efficient “fat” binaries,
and there’s nothing to do on our side.  :-)

> We can cross-compile for other architectures on the command line with
> “--target” and “--system”; can we allow for compilation with special CPU
> features across the graph with “--features”?  Build system abstractions
> or package definitions would then be changed to recognize these features
> and modify the corresponding flags as needed.

I’ve considered this, but designing this would be tricky, and not quite
right IMO.

There’s probably scientific software out there that can benefit from
using the latest SSE/AVX/whatever extension, and yet doesn’t use any of
the tricks above.  When we find such a piece of software, I think we
should investigate and (1) see whether it actually benefits from those
ISA extensions, and (2) see whether it would be feasible to just use
‘target_clones’ or similar on the hot spots.

If it turns out that this approach doesn’t scale or isn’t suitable, then
we can think more about what you suggest.  But before starting such an
endeavor, I would really like to get a better understanding of the
software we’re talking about and the options that we have.

WDYT?

Thanks,
Ludo’.

Re: Optionally using more advanced CPU features

Reply via email to