Hi Wes,

Thank you very much for the detailed reply!
I will have a look at unit tests and Dremio code.


Cheers,

Ivan
On Sun, 27 Aug 2017 at 12:16 PM, Wes McKinney <[email protected]> wrote:

> hi Ivan,
>
> As far as "Arrow support for SIMD", the primary enabler is the
> physical memory layout. We have prioritized data locality and
> contiguousness so that vectorized operators (including ones that use
> SIMD instructions) minimize cache misses. For example, all numeric
> data in Arrow is contiguous by design.
>
> In the implementations, we recommend allocating buffers that are a
> multiple of 64 bytes so that AVX2 or AVX512 instructions can be
> utilized if available; if the buffer size is not a multiple of 64,
> then array function implementations will need to take care with the
> end of a buffer (modulo 16, 32, or 64, depending on the SIMD
> instructions used).
>
> At the moment we do not have any SIMD-enabled algorithms in the
> codebase, though this will change in time. One of the major directions
> for the C++ libraries is to develop a module of operator kernels, some
> of which have SIMD versions so that operators can dynamically dispatch
> to SIMD-accelerated kernels if the host machine supports them. This is
> the approach that has been used by PyTorch and other frameworks.
>
> Much of the Arrow development up until this point has been concerned
> with hardening format details and validating compatibility between
> implementations (e.g. so that C++/Python and Java agree on the
> contents of the data they are sending to each other); we want to make
> sure the finer details of the memory format are not changing so that
> we can shift efforts to building Arrow-native analytics and other
> kinds of higher level applications.
>
> You can have look at the Arrow unit tests to see examples of
> constructing buffers, vectors (or arrays as they're called in C++),
> record batches, etc. Dremio (https://github.com/dremio/dremio-oss) is
> an example of a larger data processing application that uses Arrow as
> its native memory format.
>
> - Wes
>
> On Sat, Aug 26, 2017 at 7:14 PM, Ivan Sadikov <[email protected]>
> wrote:
> > Hello,
> >
> > Hope all you guys are doing well!
> > I would like to ask a question about Arrow support for SIMD, apologies if
> > it is a little bit abstract.
> >
> > How does Arrow code impelement such support in Java, C++ and other
> > languages? Could you link blog posts or the actual code that explain
> this?
> > Is this certain flags during compilation steps or structure of the code
> > that exploits the technique?
> >
> > Are there any examples on GitHub that show Arrow usage? Would appreciate
> if
> > you could suggest some.
> >
> > Thanks a lot!
> >
> >
> > Cheers,
> >
> > Ivan
>

Reply via email to