Kristoffer You could try some simple "placeholder" operations, such as e.g. creating a new tensor from an old tensor by only replacing a few indices (or by adding them). Then you can compare speed between tuples and arrays. Unless you see a speedup, this might not be worth pursuing.
If the bottleneck in your code is allocating the tensors, then using (or: automatically generating?) operations that re-use existing tensors might be the answer. To reduce code duplication, you can try using the `ntuple` function, and passing an actual function as argument. If this function isn't inlined, then every operation on a 9x9 tensor is unrolled to "only" 81 operations, not to 81x9 operations. (Or maybe ntuple actually uses a loop internally?) If you then represent a matrix as a 9-tuple of 9-tuples, you might reduce unrolling to 9 operations per function. Finally -- you can use `llvmcall` to generate low-level operations, and in these low-level operations you can explicitly program loops. Of course, this is likely a tad more low-level than you'd want. I'm not aware of another way. -erik On Fri, Feb 26, 2016 at 3:25 AM, Kristoffer Carlsson <[email protected]> wrote: > Botched the title a bit. Oh well. > > > On Friday, February 26, 2016 at 9:17:54 AM UTC+1, Kristoffer Carlsson wrote: >> >> Hello everyone. >> >> To avoid the XY problem I will first describe my actual use case. I am >> writing a library for tensors to be used in continuum mechanics. These are >> in dimensions 1,2,3 and typically of order 1,2,4. >> Since tensor expressions are usually chained together like (A ⊗ B) * C ⋅ a >> I want to make these stack allocated so I can get good performance without >> making a bunch of mutating functions and having to preallocate everything >> etc. >> >> The way you stack allocate vector like objects in julia right now seems to >> almost exclusively be with an immutable of a tuple. So my data structure >> right now looks similar to: >> >> immutable FourthOrderTensor{T} <: AbstractArray{T, 4} >> data::NTuple{81, T} >> end >> >> Base.size(::FourthOrderTensor) = (3,3,3,3) >> Base.linearindexing{T}(::Type{FourthOrderTensor{T}}) = Base.LinearFast() >> Base.getindex(S::FourthOrderTensor, i::Int) = S.data[i] >> >> If you'd like, you can think of the NTuple{81, T} as actually represent a >> 9x9 matrix. >> >> >> The largest operations I will need to do is a double contraction between >> two fourth order tensors which is equivalent to a 9x9 matrix multiplied with >> another 9x9 matrix. >> >> My actual question is, is there a good way to do this without generating a >> huge amount of code? >> >> I have looked at for example the FixedSizeArrays package but there they >> seem to simply unroll every operation. For a 9x9 * 9x9 matrix multiplication >> this unrolling creates a huge amount of code and doesn't feel right. >> >> I guess what I want is pretty much >> https://github.com/JuliaLang/julia/issues/11902 but is there any workaround >> as of now? >> >> Thanks! >> >> // Kristoffer -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
