Hi Pete,

The way to optimize the tensor library for hardware with limited cache
sizes would be to

1. Reduce the size of the buffer used for the ".block()" interface. I
believe we currently try to fit them in L1, but perhaps the detection
doesn't work correctly on your hardware.
2. Reduce the block sizes used in TensorContraction.

1. By default the blocksize is chosen such that the blocks fits in L1:
https://bitbucket.org/eigen/eigen/src/3cbfc2d75ecabbb0f17291d0153de6e41e568f15/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h#lines-166

 Each evaluator in an expression reports how scratch memory it needs to
compute a block's worth of data through the getResourceRequirements() API,
e.g.:
https://bitbucket.org/eigen/eigen/src/3cbfc2d75ecabbb0f17291d0153de6e41e568f15/unsupported/Eigen/CXX11/src/Tensor/TensorShuffling.h#lines-230

 These values are then merged by the the executor in the calls here:
https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h#lines-185
https://bitbucket.org/eigen/eigen/src/3cbfc2d75ecabbb0f17291d0153de6e41e568f15/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h#lines-324

2. The tensor contraction blocking uses a number of heuristics to choose
block sizes and level of parallelism. In particular, it tries to pack the
lhs into L2, and rhs into L3.

https://bitbucket.org/eigen/eigen/src/3cbfc2d75ecabbb0f17291d0153de6e41e568f15/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h#lines-127
https://bitbucket.org/eigen/eigen/src/3cbfc2d75ecabbb0f17291d0153de6e41e568f15/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h#lines-647
https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h#lines-239

I hope these pointers help.

Rasmus


On Tue, May 28, 2019 at 7:38 AM Pete Blacker <[email protected]> wrote:

> Hi there,
>
> I'm currently using the Eigen::Tensor module on a relatively small
> processors which has very limited cache, 16KB level 1 no level 2 at all!
> I've been looking for any way to optimise the blocking of operations
> performed by Eigen for a particular block size but I can't find anything so
> far.
>
> Is there a way to optimise the Tensor operations for this type of small
> cache?
>
> Thanks,
>
> Pete
>

Reply via email to