Hi Sasha, Thanks for the advice. I didn't quite catch the point. Would you explain a bit the purpose of this proposal?
We do prefer compiler auto-vectorization to explicit simd code, even if the c++ code is slower than simd one (20% is acceptable IMO). And we do support runtime dispatch kernels based on target machine arch. Then what is left to talk is how to deal with codes that are not auto-vectorizable but can be manually optimized with simd instructions. Looks your proposal is to do nothing more than adding appropriate compiler flags and wait for compilers become smarter in the future. I think this is a reasonable approach, probably is many cases. But if we do want to manually tune the code, I believe a simd library is the best way. To me there's no "replacing" between xsimd and auto-vectorization, they just do their own jobs. Yibo -----Original Message----- From: Sasha Krassovsky <krassovskysa...@gmail.com> Sent: Wednesday, March 30, 2022 6:58 AM To: dev@arrow.apache.org; emkornfi...@gmail.com Subject: Re: [C++] Replacing xsimd with compiler autovectorization xsimd has three problems I can think of right now: 1) xsimd code looks like normal simd code: you have to explicitly do loads and stores, you have to explicitly unroll and stride through your loop, and you have to explicitly process the tail of the loop. This makes writing a large number of kernels extremely tedious and error-prone. In comparison, writing a single three-line scalar for loop is easier to both read and write. 2) xsimd limits the freedom an optimizer has to select instructions and do other optimizations, as it's just a thin wrapper over normal intrinsics. One concrete example would be if we wanted to take advantage of the dynamic dispatch instruction set xsimd offers, the loop strides would no longer be compile-time constants, which might prevent the compiler from performing loop unrolling (how would it know that the stride isn't just 1?) 3) Lastly, if we ever want to support a new architecture (like Power9 or RISC-V), we'd have to wait for an xsimd backend to become available. On the other hand, if SiFive came out with a hot new chip supporting RV64V, all we'd have to do to support it is to add the appropriate compiler flag into the CMakeLists. As for using an external build system, I'm not sure how much complexity it would add, but at the very least I suspect it would work out of the box if you only wanted to support scalar kernels. Otherwise I don't think it would add much more complexity than we currently have detecting architectures at buildtime. Sasha On Tue, Mar 29, 2022 at 3:26 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > Hi Sasha, > Could you elaborate on the problems of the XSIMD dependency? What you > describe sounds a lot like what XSIMD provides in a prepackaged form > and without the extra CMake magic. > > I have to occasionally build Arrow with an external build system and > it sounds like this type of logic could add complexity there. > > Thanks, > Micah > > On Tue, Mar 29, 2022 at 3:14 PM Sasha Krassovsky < > krassovskysa...@gmail.com> > wrote: > > > Hi everyone, > > I've noticed that we include xsimd as an abstraction over all of the > > simd architectures. I'd like to propose a different solution which > > would > result > > in fewer lines of code, while being more readable. > > > > My thinking is that anything simple enough to abstract with xsimd > > can be autovectorized by the compiler. Any more interesting SIMD > > algorithm > usually > > is tailored to the target instruction set and can't be abstracted > > away > with > > xsimd anyway. > > > > With that in mind, I'd like to propose the following strategy: > > 1. Write a single source file with simple, element-at-a-time for > > loop implementations of each function. > > 2. Compile this same source file several times with different > > compile > flags > > for different vectorization (e.g. if we're on an x86 machine that > supports > > AVX2 and AVX512, we'd compile once with -mavx2 and once with -mavx512vl). > > 3. Functions compiled with different instruction sets can be > differentiated > > by a namespace, which gets defined during the compiler invocation. > > For example, for AVX2 we'd invoke the compiler with -DNAMESPACE=AVX2 > > and then for something like elementwise addition of two arrays, we'd > > call arrow::AVX2::VectorAdd. > > > > I believe this would let us remove xsimd as a dependency while also > giving > > us lots of vectorized kernels at the cost of some extra cmake magic. > After > > that, it would just be a matter of making the function registry > > point to these new functions. > > > > Please let me know your thoughts! > > > > Thanks, > > Sasha Krassovsky > > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.