On Friday, 4 December 2020 at 20:26:17 UTC, data pulverizer wrote:
On Friday, 4 December 2020 at 14:48:32 UTC, jmh530 wrote:

It looks like all the `sweep_XXX` functions are only defined for contiguous slices, as that would be the default if define a Slice!(T, N).

How the functions access the data is a big difference. If you compare the `sweep_field` version with the `sweep_naive` version, the `sweep_field` function is able to access through one index, whereas the `sweep_naive` function has to use two in the 2d version and 3 in the 3d version.

Also, the main difference in the NDSlice version is that it uses *built-in* MIR functionality, like how `sweep_ndslice` uses the `each` function from MIR, whereas `sweep_field` uses a for loop. I think this is partially to show that the built-in MIR functionality is as fast as if you tried to do it with a for loop yourself.

I see, looking at some of the code, field case is literally doing the indexing calculation right there. I guess ndslice is doing the same thing just with "Mir magic" an in the background?

sweep_ndslice uses (2*N - 1) arrays to index U, this allows LDC to unroll the loop.

More details here
https://forum.dlang.org/post/[email protected]

I'm still not sure why slice is so slow. Doesn't that completely rely on the opSlice implementations? The choice of indexing method and underlying data structure?

sweep_slice is slower because it iterates data in few loops rather than in a single one. For small matrices this makes JMP/FLOP ratio higher, for large matrices that can't feet into the CPU cache, it is less memory efficient.


Reply via email to