Kristoffer

You could try some simple "placeholder" operations, such as e.g.
creating a new tensor from an old tensor by only replacing a few
indices (or by adding them). Then you can compare speed between tuples
and arrays. Unless you see a speedup, this might not be worth
pursuing.

If the bottleneck in your code is allocating the tensors, then using
(or: automatically generating?) operations that re-use existing
tensors might be the answer.

To reduce code duplication, you can try using the `ntuple` function,
and passing an actual function as argument. If this function isn't
inlined, then every operation on a 9x9 tensor is unrolled to "only" 81
operations, not to 81x9 operations. (Or maybe ntuple actually uses a
loop internally?)

If you then represent a matrix as a 9-tuple of 9-tuples, you might
reduce unrolling to 9 operations per function.

Finally -- you can use `llvmcall` to generate low-level operations,
and in these low-level operations you can explicitly program loops. Of
course, this is likely a tad more low-level than you'd want.

I'm not aware of another way.

-erik


On Fri, Feb 26, 2016 at 3:25 AM, Kristoffer Carlsson
<[email protected]> wrote:
> Botched the title a bit. Oh well.
>
>
> On Friday, February 26, 2016 at 9:17:54 AM UTC+1, Kristoffer Carlsson wrote:
>>
>> Hello everyone.
>>
>> To avoid the XY problem I will first describe my actual use case. I am
>> writing a library for tensors to be used in continuum mechanics. These are
>> in dimensions 1,2,3 and typically of order 1,2,4.
>> Since tensor expressions are usually chained together like (A ⊗ B) * C ⋅ a
>> I want to make these stack allocated so I can get good performance without
>> making a bunch of mutating functions and having to preallocate everything
>> etc.
>>
>> The way you stack allocate vector like objects in julia right now seems to
>> almost exclusively be with an immutable of a tuple. So my data structure
>> right now looks similar to:
>>
>> immutable FourthOrderTensor{T} <: AbstractArray{T, 4}
>>     data::NTuple{81, T}
>> end
>>
>> Base.size(::FourthOrderTensor) = (3,3,3,3)
>> Base.linearindexing{T}(::Type{FourthOrderTensor{T}}) = Base.LinearFast()
>> Base.getindex(S::FourthOrderTensor, i::Int) = S.data[i]
>>
>> If you'd like, you can think of the NTuple{81, T} as actually represent a
>> 9x9 matrix.
>>
>>
>> The largest operations I will need to do is a double contraction between
>> two fourth order tensors which is equivalent to a 9x9 matrix multiplied with
>> another 9x9 matrix.
>>
>> My actual question is, is there a good way to do this without generating a
>> huge amount of code?
>>
>> I have looked at for example the FixedSizeArrays package but there they
>> seem to simply unroll every operation. For a 9x9 * 9x9 matrix multiplication
>> this unrolling creates a huge amount of code and doesn't feel right.
>>
>> I guess what I want is pretty much
>> https://github.com/JuliaLang/julia/issues/11902 but is there any workaround
>> as of now?
>>
>> Thanks!
>>
>> // Kristoffer



-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/

Reply via email to