On Saturday, 26 March 2016 at 23:31:23 UTC, Alex Parrill wrote:
On Friday, 25 March 2016 at 01:07:16 UTC, maik klein wrote:
Link to the blog post: https://maikklein.github.io/post/soa-d/
Link to the reddit discussion:
https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/
I think structs-of-arrays are a lot more situational than you
make them out to be.
You say, at the end of your article, that "SoA scales much
better because you can partially access your data without
needlessly loading unrelevant data into your cache". But most
of the time, programs access struct fields close together in
time (i.e. accessing one field of a struct usually means that
you will access another field shortly). In that case, you've
now split your data across multiple cache lines; not good.
Your ENetPeer example works against you here; the the
packetThrottle* variables would be split up into different
arrays, but they will likely be checked together when
throttling packets. Though admittedly, it's easy to fix; put
fields likely to be accessed together in their own struct.
The SoA approach also makes random access more inefficient and
makes it harder for objects to have identity. Again, your
ENetPeer example works against you; it's common for servers to
need to send packets to individual clients rather than
broadcasting them. With the SoA approach, you end up accessing
a tiny part of multiple arrays, and load several cache lines
containing data for ENetPeers that you don't care about (i.e.
loading irrelevant data).
I think SoA can be faster if you are commonly iterating over a
section of a dataset, but I don't think that's a common
occurrence. I definitely think it's unwarranted to conclude
that SoAs "scale much better" without noting when they scale
better, especially without benchmarks.
I will admit, though, that the template for making the
struct-of-arrays is a nice demonstration of D's templates.
The next blog post that I am writing will contain a few
benchmarks for SoA vs AoS.
But most of the time, programs access struct fields close
together in time (i.e. accessing one field of a struct usually
means that you will access another field shortly). In that
case, you've now split your data across multiple cache lines;
not good.
You can still group the data together if you always access it
together. What you wrote is actually not true for arrays, at
least the way you wrote it.
Array!Foo arr
Iterating over 'arr', you will always load the complete Foo
struct into memory, unless you hide stuff behind pointers.
The SoA approach also makes random access more inefficient and
makes it harder for objects to have identity.
No it actually makes it much better because you only have to load
the relevant stuff into memory.
But you usually don't look at your objects in isolation.
AoS makes sense if you always care about all fields like for
example Array!Vector3. You usually access all components of a
vector.
What you lose is the general feel of oop.
Vector add(Vector a, Vector b);
Array!Vector vectors;
add(vectors[index1], vectors[index2]);
This really just won't work with SoA, especially if you want to
mutate the data behind with a reference. For this you would just
use AoS.
Btw I have done a lot of benchmarks and SoA in the worst case was
always as fast as SoA.
But once you actually only access partial data, SoA can
potentially be much faster.
This is what I mean with scaling
You start with
struct Test{
int i;
int j;
}
Array!Test tests;
and you have absolutely no performance problem for 'tests'
because it is just so small.
But after a few years Test will have grown much bigger.
struct Test{
int i;
int j;
int[100] junk;
}
If you use SoA you can always add stuff without any performance
penalty, that is why I said that it "scales" better.
But as I have said in the blog post, you will not always replace
AoS with SoA, but you should replace AoS with SoA where it makes
sense.
I think SoA can be faster if you are commonly iterating over a
section of a dataset, but I don't think that's a common
occurrence.
This happens in games very often when you use inheritance, your
objects just will grow really big the more functionality you add.
Like for example you just want to move all objects based on
velocity, so you just care about Position, Velocity. You don't
have to load anything else into memory.
An entity component system really is just SoA at its core.