Thanks for your reply Eric. You always have good suggestions. On Friday, February 26, 2016 at 2:10:48 PM UTC+1, Erik Schnetter wrote: > > Kristoffer > > You could try some simple "placeholder" operations, such as e.g. > creating a new tensor from an old tensor by only replacing a few > indices (or by adding them). Then you can compare speed between tuples > and arrays. Unless you see a speedup, this might not be worth > pursuing. >
I looked a bit at https://github.com/StephenVavasis/Modifyfield.jl and it seems the compiler is quite good at removing redundant writes when it comes to . > If the bottleneck in your code is allocating the tensors, then using > (or: automatically generating?) operations that re-use existing > tensors might be the answer. > Yes the bottleneck is in allocation. Regarding mutating functions, it is definitely a possibility but then you need to pass in a buffer, and the code gets a lot uglier because you are basically prohibiting infix operators Compare (exaggerated for effect) (A ⊗ B) * C ⋅ a and dot!(ABCA_buff, dcontract!(ABC_buffer, C, otimes!(AB_buffer, A, B), a) It is of course possible create a dict to hold temporary arrays of the correct shape and type however, you need to be careful so you don't get wrong results due to aliasing. Also I have found that caching arrays in a dict is often quite slow. Maybe this whole things would be solved by https://github.com/JuliaLang/julia/pull/8134 combined with https://github.com/JuliaLang/julia/pull/12205 > > To reduce code duplication, you can try using the `ntuple` function, > and passing an actual function as argument. If this function isn't > inlined, then every operation on a 9x9 tensor is unrolled to "only" 81 > operations, not to 81x9 operations. (Or maybe ntuple actually uses a > loop internally?) > > If you then represent a matrix as a 9-tuple of 9-tuples, you might > reduce unrolling to 9 operations per function. > Yes maybe computing one column at a time by using a nested tuple is the way to go. Like you say, it would give 81 operations and not the 700 I currently get. I think this is actually how FixedSizeArrays are doing it. > > Finally -- you can use `llvmcall` to generate low-level operations, > and in these low-level operations you can explicitly program loops. Of > course, this is likely a tad more low-level than you'd want. > > I'm not aware of another way. > > -erik > > > On Fri, Feb 26, 2016 at 3:25 AM, Kristoffer Carlsson > <[email protected] <javascript:>> wrote: > > Botched the title a bit. Oh well. > > > > > > On Friday, February 26, 2016 at 9:17:54 AM UTC+1, Kristoffer Carlsson > wrote: > >> > >> Hello everyone. > >> > >> To avoid the XY problem I will first describe my actual use case. I am > >> writing a library for tensors to be used in continuum mechanics. These > are > >> in dimensions 1,2,3 and typically of order 1,2,4. > >> Since tensor expressions are usually chained together like (A ⊗ B) * C > ⋅ a > >> I want to make these stack allocated so I can get good performance > without > >> making a bunch of mutating functions and having to preallocate > everything > >> etc. > >> > >> The way you stack allocate vector like objects in julia right now seems > to > >> almost exclusively be with an immutable of a tuple. So my data > structure > >> right now looks similar to: > >> > >> immutable FourthOrderTensor{T} <: AbstractArray{T, 4} > >> data::NTuple{81, T} > >> end > >> > >> Base.size(::FourthOrderTensor) = (3,3,3,3) > >> Base.linearindexing{T}(::Type{FourthOrderTensor{T}}) = > Base.LinearFast() > >> Base.getindex(S::FourthOrderTensor, i::Int) = S.data[i] > >> > >> If you'd like, you can think of the NTuple{81, T} as actually represent > a > >> 9x9 matrix. > >> > >> > >> The largest operations I will need to do is a double contraction > between > >> two fourth order tensors which is equivalent to a 9x9 matrix multiplied > with > >> another 9x9 matrix. > >> > >> My actual question is, is there a good way to do this without > generating a > >> huge amount of code? > >> > >> I have looked at for example the FixedSizeArrays package but there they > >> seem to simply unroll every operation. For a 9x9 * 9x9 matrix > multiplication > >> this unrolling creates a huge amount of code and doesn't feel right. > >> > >> I guess what I want is pretty much > >> https://github.com/JuliaLang/julia/issues/11902 but is there any > workaround > >> as of now? > >> > >> Thanks! > >> > >> // Kristoffer > > > > -- > Erik Schnetter <[email protected] <javascript:>> > http://www.perimeterinstitute.ca/personal/eschnetter/ >
