On 4/1/20 11:23 AM, data pulverizer wrote:
Thanks for all the suggestions made so far. I am still interested in looking at the implementation details of the slice assign `arr[] = x` which I can't seem to find. Before I made my initial post, I tried doing a `memcpy` and `memmove` under a `for` loop but it did not change the performance or get the same kind of performance as the initial slice performance so I didn't bother to mention them, I haven't tried it with the suggested compiler flags though.

Using disassembly, on run.dlang.io, it says it's using __memsetDouble.


@StevenSchveighoffer also suggested using `memset` (as well as `memcpy`) please correct me if I am wrong but it looks as if `memset` can only write from an `int` sized source and I need the source size to be any potential size (T).

Again, the compiler uses whatever tools are available. It might be memset, it might be something else.


In the case of your code, it's using __memsetDouble, which I have no idea where it's defined (probably libc).

----------------------------------------------------------------------

On a related aside I noticed that the timing was reduced across the board so much so that the initial slice time halved when initialising with:

```
auto arr = (cast(T*)GC.malloc(T.sizeof*n, GC.BlkAttr.NO_SCAN | GC.BlkAttr.APPENDABLE))[0..n];
```

Instead of:

```
auto arr = new T[n];
```

What this means is, don't scan the block for pointers during a GC collect cycle. If you have pointers in your T, this is a very bad idea. Not only that, but this does not initialize the appendable data at the end of the block.

In addition, GC.malloc just zero-initializes the data. If you do new T[n], and T has an initializer, it's going to be a lot more expensive.

If you are going to use this, remove the GC.BlkAttr.APPENDABLE.

In the case of double, it is initialized to NaN.

This could explain the difference in timing.


I noticed that `GC.malloc()` is based on `gc_malloc()` which gives the bit mask option that makes it  faster than `core.stdc.stdlib: malloc`. Is `gc_malloc` OS dependent? I can't find it in the standard C library, the only reference I found for it is [here](https://linux.die.net/man/3/gc) and it is named slightly differently but appears to be the same function. In `core.memory`, it is specified by the `extern (C)` declaration (https://github.com/dlang/druntime/blob/master/src/core/memory.d) so I guess it must be somewhere on my system?

It's in the D garbage collector, here: https://github.com/dlang/druntime/blob/2eec30b35bab308a37298331353bdce5fee1b657/src/gc/proxy.d#L166

extern(C) functions can be implemented in D. The major difference between standard D functions and extern(C) is that the latter does not do name mangling.

-Steve

Reply via email to