hi Ivan, As far as "Arrow support for SIMD", the primary enabler is the physical memory layout. We have prioritized data locality and contiguousness so that vectorized operators (including ones that use SIMD instructions) minimize cache misses. For example, all numeric data in Arrow is contiguous by design.
In the implementations, we recommend allocating buffers that are a multiple of 64 bytes so that AVX2 or AVX512 instructions can be utilized if available; if the buffer size is not a multiple of 64, then array function implementations will need to take care with the end of a buffer (modulo 16, 32, or 64, depending on the SIMD instructions used). At the moment we do not have any SIMD-enabled algorithms in the codebase, though this will change in time. One of the major directions for the C++ libraries is to develop a module of operator kernels, some of which have SIMD versions so that operators can dynamically dispatch to SIMD-accelerated kernels if the host machine supports them. This is the approach that has been used by PyTorch and other frameworks. Much of the Arrow development up until this point has been concerned with hardening format details and validating compatibility between implementations (e.g. so that C++/Python and Java agree on the contents of the data they are sending to each other); we want to make sure the finer details of the memory format are not changing so that we can shift efforts to building Arrow-native analytics and other kinds of higher level applications. You can have look at the Arrow unit tests to see examples of constructing buffers, vectors (or arrays as they're called in C++), record batches, etc. Dremio (https://github.com/dremio/dremio-oss) is an example of a larger data processing application that uses Arrow as its native memory format. - Wes On Sat, Aug 26, 2017 at 7:14 PM, Ivan Sadikov <[email protected]> wrote: > Hello, > > Hope all you guys are doing well! > I would like to ask a question about Arrow support for SIMD, apologies if > it is a little bit abstract. > > How does Arrow code impelement such support in Java, C++ and other > languages? Could you link blog posts or the actual code that explain this? > Is this certain flags during compilation steps or structure of the code > that exploits the technique? > > Are there any examples on GitHub that show Arrow usage? Would appreciate if > you could suggest some. > > Thanks a lot! > > > Cheers, > > Ivan
