A solution :

  *   do all the math/algos outside the main, in a dynamic libs (.so, .dll, ...)
  *   build multiple dyn libs for the ISA you care about (sse.so, avx1.so, 
avx2.so, avx512.so, ... )
  *   dynamic loading the right lib from the main according to the features of 
the current running deployed cpu: (https://github.com/google/cpu_features)
  *   calling your api in the lib from the main to let the backends run the 
algo with the best optim

Now, I have the feeling that the long term solution would be for eigen to do a 
minimum of JIT. Example: oneDNN with asmjit : https://github.com/asmjit/asmjit
Kind
W.

________________________________

Share your
feedback with us
From: Edward Lam <[email protected]>
Sent: Thursday, September 17, 2020 9:24 PM
To: [email protected] <[email protected]>
Subject: Re: [eigen] Vectorization for general use

Offhand, I wonder if you could put main() in its own source file and compile it 
without any vectorization compiler options, and have that call your real main() 
renamed in a different source file that does have vectorization compiler 
options enabled. Then your new main() could do CPUID checks (eg. 
https://stackoverflow.com/a/4823889 ) and bail out gracefully. You will of 
course need to ensure that the CPUID checks are accurate for your compiler 
options, which may present its own challenges.

Cheers,
-Edward

On Thu, Sep 17, 2020 at 10:52 PM Rob McDonald 
<[email protected]<mailto:[email protected]>> wrote:
I maintain an open source program that uses Eigen.  The vast majority of my 
users do not compile the program, instead downloading a pre-compiled binary 
from our website.  About 80% are on Windows, 10% on Mac and 10% on Linux.  I 
only provide X86 builds, 32 and 64-bit on Windows, 64-bit only on Mac and 
Linux.  We may eliminate the 32-bit Windows build soon.

Historically, I have compiled with no special flags enabling vectorization 
options for the CPU.  I would like to pursue this as I expect it will unlock 
some nice performance gains.  However, I'd like to keep things simple and 
compatible for users.

What happens when someone runs a program compiled with vectorization when their 
CPU does not support it?  If it fails, how graceful is the failure?

Is there a standard approach to identify the capabilities of a given machine?  
I could add that to my program and survey users before making a change...  
Would such code still run on a machine that was in the process of failing due 
to not having support for the built in vectorization?  I.e. if it is crashing, 
can we send a message as to why we're going down?

Is there a graceful way to support multiple options?

Any tips from other broad use applications is greatly appreciated.

Rob





Click 
here<https://www.mailcontrol.com/sr/IDXDiOSqylnGX2PQPOmvUhe0y89-yNqhZAviLmkDXL06gGw831_8qiYaAxJOEWVK7LHzKdJh-eoDMGoTToeXlw==>
 to report this email as spam.

Reply via email to