As matrix multiplication has a cost of O(N^3) while using only O(N^2)
elements, and since many integers (those smaller than 2^52 or so) can be
represented exactly as Float64 values, one approach could be to convert the
matrices to Float64, multiply them, and then convert back. For 64-bit
integers this might even be the fastest option allowed by common hardware.
For 32-bit integers, you could investigate whether using Float32 as
intermediate representation suffices.

## Advertising

-erik
On Wed, Sep 21, 2016 at 7:18 PM, Lutfullah Tomak <tomaklu...@gmail.com>
wrote:
> Float matrix multiplication uses heavily optimized openblas but integer
> matrix multiplication is a generic one from julia and there is not much you
> can do to improve it a lot because not all cpus have simd multiplication
> and addition for integers.
--
Erik Schnetter <schnet...@gmail.com>
http://www.perimeterinstitute.ca/personal/eschnetter/