larroy commented on issue #14570: add a compiler flag to use int64 as tensor size URL: https://github.com/apache/incubator-mxnet/pull/14570#issuecomment-484581787 Thanks a lot for the detailed report and analysis @apeforest and @samskalicky . From the data you guys have provided I'm missing a disassembly dump of the small loop that is supposed to be slow. Could you guys provide this with "objdump -d" or similar? I find it surprising that the degradation is only due to data widening. I suspect the cause is more the memory access than the wider arithmetic operation itself. I think having the additional data point of the assembly would help reach a more solid conclusion.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
