> > My own impression is that the emphasis may be slightly exagerated. But > perhaps some other benchmarks would prove differently.
This is probably true. [1] is the original mailing list discussion. I think lack of measurable differences and high overhead for 64 byte alignment was the reason for relaxing to 8 byte alignment. Specifically, I performed two types of tests, a "random sum" where we > compute the sum of the values taken at random indices, and "sum", where we > sum all values of the array (buffer[1] of the primitive array), both for > array ranging from 2^10 to 2^25 elements. I was expecting that, at least in > the latter, prefetching would help, but I do not observe any difference. The most likely place I think where this could make a difference would be for operations on wider types (Decimal128 and Decimal256). Another place where I think alignment could help is when adding two primitive arrays (it sounds like this was summing a single array?). [1] https://lists.apache.org/thread.html/945b65fb4bc8bcdab695b572f9e9c2dca4cd89012fdbd896a6f2d886%401460092304%40%3Cdev.arrow.apache.org%3E On Mon, Sep 6, 2021 at 3:05 PM Antoine Pitrou <anto...@python.org> wrote: > > Le 06/09/2021 à 23:20, Jorge Cardoso Leitão a écrit : > > Thanks a lot Antoine for the pointers. Much appreciated! > > > > Generally, it should not hurt to align allocations to 64 bytes anyway, > >> since you are generally dealing with large enough data that the > >> (small) memory overhead doesn't matter. > > > > Not for performance. However, 64 byte alignment in Rust requires > > maintaining a custom container, a custom allocator, and the inability to > > interoperate with `std::Vec` and the ecosystem that is based on it, since > > std::Vec allocates with alignment T (.e.g int32), not 64 bytes. For > anyone > > interested, the background for this is this old PR [1] in this in arrow2 > > [2]. > > I see. In the C++ implementation, we are not compatible with the default > allocator either (but C++ allocators as defined by the standard library > don't support resizing, which doesn't make them terribly useful for > Arrow anyway). > > > Neither myself in micro benches nor Ritchie from polars (query engine) in > > large scale benches observe any difference in the archs we have > available. > > This is not consistent with the emphasis we put on the memory alignments > > discussion [3], and I am trying to understand the root cause for this > > inconsistency. > > My own impression is that the emphasis may be slightly exagerated. But > perhaps some other benchmarks would prove differently. > > > By prefetching I mean implicit; no intrinsics involved. > > Well, I'm not aware that implicit prefetching depends on alignment. > > Regards > > Antoine. >