kpuatamazon edited a comment on pull request #18885: URL: https://github.com/apache/incubator-mxnet/pull/18885#issuecomment-671281484
If we wanted to be really pedantic, alignment could be set based on CPUID. I suppose MXNet has a larger problem that the Intel people might want to pontificate on: the tensors might be aligned but nobody told the compiler that e.g. with `__attribute__((aligned(64)))` so the kernels are still generating branches to handle unaligned data. Also, I note your benchmark doesn't have a GEMM which is where the big costs come from. In any case, I'd like to see this in because my code was written to depend on it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
