Hi Wes, Thank you very much for the detailed reply! I will have a look at unit tests and Dremio code.
Cheers, Ivan On Sun, 27 Aug 2017 at 12:16 PM, Wes McKinney <[email protected]> wrote: > hi Ivan, > > As far as "Arrow support for SIMD", the primary enabler is the > physical memory layout. We have prioritized data locality and > contiguousness so that vectorized operators (including ones that use > SIMD instructions) minimize cache misses. For example, all numeric > data in Arrow is contiguous by design. > > In the implementations, we recommend allocating buffers that are a > multiple of 64 bytes so that AVX2 or AVX512 instructions can be > utilized if available; if the buffer size is not a multiple of 64, > then array function implementations will need to take care with the > end of a buffer (modulo 16, 32, or 64, depending on the SIMD > instructions used). > > At the moment we do not have any SIMD-enabled algorithms in the > codebase, though this will change in time. One of the major directions > for the C++ libraries is to develop a module of operator kernels, some > of which have SIMD versions so that operators can dynamically dispatch > to SIMD-accelerated kernels if the host machine supports them. This is > the approach that has been used by PyTorch and other frameworks. > > Much of the Arrow development up until this point has been concerned > with hardening format details and validating compatibility between > implementations (e.g. so that C++/Python and Java agree on the > contents of the data they are sending to each other); we want to make > sure the finer details of the memory format are not changing so that > we can shift efforts to building Arrow-native analytics and other > kinds of higher level applications. > > You can have look at the Arrow unit tests to see examples of > constructing buffers, vectors (or arrays as they're called in C++), > record batches, etc. Dremio (https://github.com/dremio/dremio-oss) is > an example of a larger data processing application that uses Arrow as > its native memory format. > > - Wes > > On Sat, Aug 26, 2017 at 7:14 PM, Ivan Sadikov <[email protected]> > wrote: > > Hello, > > > > Hope all you guys are doing well! > > I would like to ask a question about Arrow support for SIMD, apologies if > > it is a little bit abstract. > > > > How does Arrow code impelement such support in Java, C++ and other > > languages? Could you link blog posts or the actual code that explain > this? > > Is this certain flags during compilation steps or structure of the code > > that exploits the technique? > > > > Are there any examples on GitHub that show Arrow usage? Would appreciate > if > > you could suggest some. > > > > Thanks a lot! > > > > > > Cheers, > > > > Ivan >
