larroy commented on issue #14570: add a compiler flag to use int64 as tensor size URL: https://github.com/apache/incubator-mxnet/pull/14570#issuecomment-484582885 To be honest, while I understand the rationale of the solution proposed. I think it would be better if we could avoid having yet another build flavour if possible. Could we for example, narrow the computation at runtime in case the number of dimensions doesn't overflow? Also see my previous comment above to see if we can work with memory access to make it almost as fast with the wider type in this particular computation. If I understood correctly the regression is only in transposing? the other operators are fine? We could try to find a solution only for this operator then instead of adding a new build flavour.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
