Right - we can always use calloc and then provide aligned memory.

Perhaps this is worth benchmarking. It is still likely to be much faster than 
malloc + memset, because it should have significantly better cache behaviour, 
even though the zeroing is not free. The question is, whether this cost is 
small enough.

-viral



> On 25-Nov-2014, at 9:45 am, Stefan Karpinski <[email protected]> wrote:
> 
> That's not the point – if you already have memory and have to fill it, then 
> you're not in any position for the kernel to lazily zero it, so the alignment 
> of arbitrary arrays is irrelevant. The point SGJ was making is that we want 
> to allocate the memory using something calloc-like so that the kernel can do 
> lazy zeroing for us, but we also need that memory to be 16-byte aligned, but 
> there is not portable way to get 16-byte-aligned memory that the kernel will 
> lazily zero for you. We can have lazy zeroing or 16-byte alignment but not 
> both. This makes me wonder if we couldn't just allocate 15 bytes more than 
> necessary and return the first address that on a 16-byte boundary.
> 
> On Mon, Nov 24, 2014 at 11:02 PM, Viral Shah <[email protected]> wrote:
> To add to the point, you can also get non-aligned stuff with subarrays or 
> results from a ccall.
> 
> -viral
> 
> 
> On Tuesday, November 25, 2014 9:24:36 AM UTC+5:30, Simon Kornblith wrote:
> In general, arrays cannot be assumed to be 16-byte aligned because it's 
> always possible to create one that isn't using pointer_to_array. However, 
> from Intel's AVX introduction:
> 
> Intel® AVX has relaxed some memory alignment requirements, so now Intel AVX 
> by default allows unaligned access; however, this access may come at a 
> performance slowdown, so the old rule of designing your data to be memory 
> aligned is still good practice (16-byte aligned for 128-bit access and 
> 32-byte aligned for 256-bit access).
> 
> On Monday, November 24, 2014 10:01:45 PM UTC-5, Erik Schnetter wrote:
> On Mon, Nov 24, 2014 at 9:30 PM, Steven G. Johnson 
> <[email protected]> wrote: 
> > Unfortunately, Julia allocates 16-byte aligned data by default (to help 
> > SIMD 
> > code), and there is no calloc version of posix_memalign as far as I know. 
> 
> The generated machine code I've seen does not make use of this. All 
> the load/store instructions in vectorized or unrolled loops assume 
> unaligned pointers. (Plus, with AVX one should align to 32 bytes 
> instead.) 
> 
> -erik 
> 
> -- 
> Erik Schnetter <[email protected]> 
> http://www.perimeterinstitute.ca/personal/eschnetter/ 
> 

Reply via email to