A solution : * do all the math/algos outside the main, in a dynamic libs (.so, .dll, ...) * build multiple dyn libs for the ISA you care about (sse.so, avx1.so, avx2.so, avx512.so, ... ) * dynamic loading the right lib from the main according to the features of the current running deployed cpu: (https://github.com/google/cpu_features) * calling your api in the lib from the main to let the backends run the algo with the best optim
Now, I have the feeling that the long term solution would be for eigen to do a minimum of JIT. Example: oneDNN with asmjit : https://github.com/asmjit/asmjit Kind W. ________________________________ Share your feedback with us From: Edward Lam <[email protected]> Sent: Thursday, September 17, 2020 9:24 PM To: [email protected] <[email protected]> Subject: Re: [eigen] Vectorization for general use Offhand, I wonder if you could put main() in its own source file and compile it without any vectorization compiler options, and have that call your real main() renamed in a different source file that does have vectorization compiler options enabled. Then your new main() could do CPUID checks (eg. https://stackoverflow.com/a/4823889 ) and bail out gracefully. You will of course need to ensure that the CPUID checks are accurate for your compiler options, which may present its own challenges. Cheers, -Edward On Thu, Sep 17, 2020 at 10:52 PM Rob McDonald <[email protected]<mailto:[email protected]>> wrote: I maintain an open source program that uses Eigen. The vast majority of my users do not compile the program, instead downloading a pre-compiled binary from our website. About 80% are on Windows, 10% on Mac and 10% on Linux. I only provide X86 builds, 32 and 64-bit on Windows, 64-bit only on Mac and Linux. We may eliminate the 32-bit Windows build soon. Historically, I have compiled with no special flags enabling vectorization options for the CPU. I would like to pursue this as I expect it will unlock some nice performance gains. However, I'd like to keep things simple and compatible for users. What happens when someone runs a program compiled with vectorization when their CPU does not support it? If it fails, how graceful is the failure? Is there a standard approach to identify the capabilities of a given machine? I could add that to my program and survey users before making a change... Would such code still run on a machine that was in the process of failing due to not having support for the built in vectorization? I.e. if it is crashing, can we send a message as to why we're going down? Is there a graceful way to support multiple options? Any tips from other broad use applications is greatly appreciated. Rob Click here<https://www.mailcontrol.com/sr/IDXDiOSqylnGX2PQPOmvUhe0y89-yNqhZAviLmkDXL06gGw831_8qiYaAxJOEWVK7LHzKdJh-eoDMGoTToeXlw==> to report this email as spam.
