On 4/1/20 11:23 AM, data pulverizer wrote:
Thanks for all the suggestions made so far. I am still interested in
looking at the implementation details of the slice assign `arr[] = x`
which I can't seem to find. Before I made my initial post, I tried doing
a `memcpy` and `memmove` under a `for` loop but it did not change the
performance or get the same kind of performance as the initial slice
performance so I didn't bother to mention them, I haven't tried it with
the suggested compiler flags though.
Using disassembly, on run.dlang.io, it says it's using __memsetDouble.
@StevenSchveighoffer also suggested using `memset` (as well as `memcpy`)
please correct me if I am wrong but it looks as if `memset` can only
write from an `int` sized source and I need the source size to be any
potential size (T).
Again, the compiler uses whatever tools are available. It might be
memset, it might be something else.
In the case of your code, it's using __memsetDouble, which I have no
idea where it's defined (probably libc).
----------------------------------------------------------------------
On a related aside I noticed that the timing was reduced across the
board so much so that the initial slice time halved when initialising with:
```
auto arr = (cast(T*)GC.malloc(T.sizeof*n, GC.BlkAttr.NO_SCAN |
GC.BlkAttr.APPENDABLE))[0..n];
```
Instead of:
```
auto arr = new T[n];
```
What this means is, don't scan the block for pointers during a GC
collect cycle. If you have pointers in your T, this is a very bad idea.
Not only that, but this does not initialize the appendable data at the
end of the block.
In addition, GC.malloc just zero-initializes the data. If you do new
T[n], and T has an initializer, it's going to be a lot more expensive.
If you are going to use this, remove the GC.BlkAttr.APPENDABLE.
In the case of double, it is initialized to NaN.
This could explain the difference in timing.
I noticed that `GC.malloc()` is based on `gc_malloc()` which gives the
bit mask option that makes it faster than `core.stdc.stdlib: malloc`.
Is `gc_malloc` OS dependent? I can't find it in the standard C library,
the only reference I found for it is
[here](https://linux.die.net/man/3/gc) and it is named slightly
differently but appears to be the same function. In `core.memory`, it is
specified by the `extern (C)` declaration
(https://github.com/dlang/druntime/blob/master/src/core/memory.d) so I
guess it must be somewhere on my system?
It's in the D garbage collector, here:
https://github.com/dlang/druntime/blob/2eec30b35bab308a37298331353bdce5fee1b657/src/gc/proxy.d#L166
extern(C) functions can be implemented in D. The major difference
between standard D functions and extern(C) is that the latter does not
do name mangling.
-Steve