If you represent a matrix as a vector of vectors, then you have one call to `ntuple` (probably unrolled 9 times), each calling another `ntuple` (probably also unrolled 9 times), which would give you 18 invocations of the function Tim just showed.
It's not clear whether the unrolling is good or bad, since it will avoid many index calculations, and will enable generating SIMD instructions. -erik On Fri, Feb 26, 2016 at 9:57 AM, Tim Holy <[email protected]> wrote: > I'm not sure there's any reason you couldn't use a loop for the "K" axis in > "M-by-K * K-by-N", but as you say the problem is that when using tuples you > have to create the whole thing at once. So you'd need 81 calls to a function > like > > @noinline function loopdot(A::Mat{M,K}, B::Mat{K,N}, Arow, Bcol) > s = zero(eltype(A))*zero(eltype(B)) > for k = 1:K > s += A[Arow, k] * B[k, Bcol] > end > s > end > > (The @noinline will help guarantee that less code is generated.) Of course, > LLVM might unroll that loop for you. > > Best, > --Tim > > On Friday, February 26, 2016 12:17:53 AM Kristoffer Carlsson wrote: >> Hello everyone. >> >> To avoid the XY problem I will first describe my actual use case. I am >> writing a library for tensors to be used in continuum mechanics. These are >> in dimensions 1,2,3 and typically of order 1,2,4. >> Since tensor expressions are usually chained together like (A ⊗ B) * C ⋅ a >> I want to make these stack allocated so I can get good performance without >> making a bunch of mutating functions and having to preallocate everything >> etc. >> >> The way you stack allocate vector like objects in julia right now seems to >> almost exclusively be with an immutable of a tuple. So my data structure >> right now looks similar to: >> >> immutable FourthOrderTensor{T} <: AbstractArray{T, 4} >> data::NTuple{81, T} >> end >> >> Base.size(::FourthOrderTensor) = (3,3,3,3) >> Base.linearindexing{T}(::Type{FourthOrderTensor{T}}) = Base.LinearFast() >> Base.getindex(S::FourthOrderTensor, i::Int) = S.data[i] >> >> If you'd like, you can think of the NTuple{81, T} as actually represent a >> 9x9 matrix. >> >> >> The largest operations I will need to do is a double contraction between >> two fourth order tensors which is equivalent to a 9x9 matrix multiplied >> with another 9x9 matrix. >> >> My actual question is, is there a good way to do this without generating a >> huge amount of code? >> >> I have looked at for example the FixedSizeArrays package but there they >> seem to simply unroll every operation. For a 9x9 * 9x9 matrix >> multiplication this unrolling creates a huge amount of code and doesn't >> feel right. >> >> I guess what I want is pretty >> much https://github.com/JuliaLang/julia/issues/11902 but is there any >> workaround as of now? >> >> Thanks! >> >> // Kristoffer > -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
