>
> My own impression is that the emphasis may be slightly exagerated. But
> perhaps some other benchmarks would prove differently.


This is probably true.  [1] is the original mailing list discussion.  I
think lack of measurable differences and high overhead for 64 byte
alignment was the reason for relaxing to 8 byte alignment.

Specifically, I performed two types of tests, a "random sum" where we
> compute the sum of the values taken at random indices, and "sum", where we
> sum all values of the array (buffer[1] of the primitive array), both for
> array ranging from 2^10 to 2^25 elements. I was expecting that, at least in
> the latter, prefetching would help, but I do not observe any difference.


The most likely place I think where this could make a difference would be
for operations on wider types (Decimal128 and Decimal256).   Another place
where I think alignment could help is when adding two primitive arrays (it
sounds like this was summing a single array?).

[1]
https://lists.apache.org/thread.html/945b65fb4bc8bcdab695b572f9e9c2dca4cd89012fdbd896a6f2d886%401460092304%40%3Cdev.arrow.apache.org%3E

On Mon, Sep 6, 2021 at 3:05 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 06/09/2021 à 23:20, Jorge Cardoso Leitão a écrit :
> > Thanks a lot Antoine for the pointers. Much appreciated!
> >
> > Generally, it should not hurt to align allocations to 64 bytes anyway,
> >> since you are generally dealing with large enough data that the
> >> (small) memory overhead doesn't matter.
> >
> > Not for performance. However, 64 byte alignment in Rust requires
> > maintaining a custom container, a custom allocator, and the inability to
> > interoperate with `std::Vec` and the ecosystem that is based on it, since
> > std::Vec allocates with alignment T (.e.g int32), not 64 bytes. For
> anyone
> > interested, the background for this is this old PR [1] in this in arrow2
> > [2].
>
> I see. In the C++ implementation, we are not compatible with the default
> allocator either (but C++ allocators as defined by the standard library
> don't support resizing, which doesn't make them terribly useful for
> Arrow anyway).
>
> > Neither myself in micro benches nor Ritchie from polars (query engine) in
> > large scale benches observe any difference in the archs we have
> available.
> > This is not consistent with the emphasis we put on the memory alignments
> > discussion [3], and I am trying to understand the root cause for this
> > inconsistency.
>
> My own impression is that the emphasis may be slightly exagerated. But
> perhaps some other benchmarks would prove differently.
>
> > By prefetching I mean implicit; no intrinsics involved.
>
> Well, I'm not aware that implicit prefetching depends on alignment.
>
> Regards
>
> Antoine.
>

Reply via email to