hi Ivan,

As far as "Arrow support for SIMD", the primary enabler is the
physical memory layout. We have prioritized data locality and
contiguousness so that vectorized operators (including ones that use
SIMD instructions) minimize cache misses. For example, all numeric
data in Arrow is contiguous by design.

In the implementations, we recommend allocating buffers that are a
multiple of 64 bytes so that AVX2 or AVX512 instructions can be
utilized if available; if the buffer size is not a multiple of 64,
then array function implementations will need to take care with the
end of a buffer (modulo 16, 32, or 64, depending on the SIMD
instructions used).

At the moment we do not have any SIMD-enabled algorithms in the
codebase, though this will change in time. One of the major directions
for the C++ libraries is to develop a module of operator kernels, some
of which have SIMD versions so that operators can dynamically dispatch
to SIMD-accelerated kernels if the host machine supports them. This is
the approach that has been used by PyTorch and other frameworks.

Much of the Arrow development up until this point has been concerned
with hardening format details and validating compatibility between
implementations (e.g. so that C++/Python and Java agree on the
contents of the data they are sending to each other); we want to make
sure the finer details of the memory format are not changing so that
we can shift efforts to building Arrow-native analytics and other
kinds of higher level applications.

You can have look at the Arrow unit tests to see examples of
constructing buffers, vectors (or arrays as they're called in C++),
record batches, etc. Dremio (https://github.com/dremio/dremio-oss) is
an example of a larger data processing application that uses Arrow as
its native memory format.

- Wes

On Sat, Aug 26, 2017 at 7:14 PM, Ivan Sadikov <[email protected]> wrote:
> Hello,
>
> Hope all you guys are doing well!
> I would like to ask a question about Arrow support for SIMD, apologies if
> it is a little bit abstract.
>
> How does Arrow code impelement such support in Java, C++ and other
> languages? Could you link blog posts or the actual code that explain this?
> Is this certain flags during compilation steps or structure of the code
> that exploits the technique?
>
> Are there any examples on GitHub that show Arrow usage? Would appreciate if
> you could suggest some.
>
> Thanks a lot!
>
>
> Cheers,
>
> Ivan

Reply via email to