On Tuesday, 22 December 2015 at 17:52:52 UTC, Andrew Chapman wrote:
On Tuesday, 22 December 2015 at 17:43:00 UTC, H. S. Teoh wrote:

I wonder if the slowdown is caused by GC collection cycles (because calling to!string will allocate, and here you're making a very large number of small allocations, which is known to cause GC performance issues).

Try inserting this before the loop:

        import core.memory;
        GC.disable();

Does this make a difference in the running time?


T

Thanks! Unfortunately that actually makes it run slightly slower :-)

I dislike guessing so here are my thought on it and the method I used to get there:

First I used the following code:

    void f() {
        import std.conv;

        string s;
        foreach (i ; 0..1_000_000) {
            s = i.to!string;
        }
    }

    void g() {
        import core.memory;
        import std.conv;

        GC.disable;

        string s;
        foreach (i ; 0..1_000_000) {
            s = i.to!string;
        }
    }

    extern (C) int snprintf(char*, size_t, const char*, ...);
    void h() {
        char[10] s;
        foreach (i ; 0..1_000_000) {
            snprintf(cast(char*)s, 10, cast(char*)"%d", i);
        }
    }

    void main(string[] args) {
           f;
        // g;
        // h;
    }

Note that for h I didn't really use a string, I used a char[10]. We should keep in mind that you may want to convert it to a proper string later.

I compiled it three times (one per function) with `dmd -profile -profile=gc` and then ran it to get profiling data. I then compiled it with `dmd` while timing each execution 6 times and discarding the first time (as the first one has to heat the cache). The reason for the recompilation is that as the C code can't be profiled and profiling puts some overhead I wanted to get that
off my shoulders while timing.

Here are the average times:

f: 0.50 user 0.00 system 95% CPU 0.530 total

g: 0.49 user 0.01 system 96% CPU 0.522 total

h: 0.17 user 0.00 system 92% CPU 0.188 total

The GC profiler didn't find any allocation be it in f, g or h.

The profiler was only available for f and g with very similar results (as
they are the same function really, here is the interesting part):

======== Timer Is 3579545 Ticks/Sec, Times are in Microsecs ========

  Num          Tree        Func        Per
  Calls        Time        Time        Call

1000000 390439368 265416744 265 pure nothrow @safe char[] std.array.array!(std.conv.toChars!(10, char, 1, int).toChars(int).Result).array(std.conv.toChars!(10, char, 1, int).toChars(int).Result) 1000000 83224994 83224994 83 pure nothrow ref @nogc @safe std.conv.toChars!(10, char, 1, int).toChars(int).Result std.conv.toChars!(10, char, 1, int).toChars(int).Result.__ctor(int)
23666670 1182768507    73190160           3     _Dmain
1000000 525732328 35191373 35 pure nothrow @trusted immutable(char)[] std.conv.toImpl!(immutable(char)[], int).toImpl(int, uint, std.ascii.LetterCase) 5888890 43695659 33204064 5 pure nothrow ref @nogc @safe char std.conv.emplaceRef!(char, char).emplaceRef(ref char, ref char) 1000000 32745909 32745909 32 pure nothrow char[] std.array.arrayAllocImpl!(false, char[], ulong).arrayAllocImpl(ulong) 1000000 100101586 16876591 16 pure nothrow @nogc @safe std.conv.toChars!(10, char, 1, int).toChars(int).Result std.conv.toChars!(10, char, 1, int).toChars(int) 5888890 13070014 13070014 2 pure nothrow @property @nogc @safe char std.conv.toChars!(10, char, 1, int).toChars(int).Result.front() 1000000 44899444 12153535 12 pure nothrow @trusted char[] std.array.uninitializedArray!(char[], ulong).uninitializedArray(ulong) 5888890 10491595 10491595 1 pure nothrow ref @nogc @safe char std.conv.emplaceImpl!(char).emplaceImpl!(char).emplaceImpl(ref char, ref char) 6888890 9952971 9952971 1 pure nothrow @property @nogc @safe bool std.conv.toChars!(10, char, 1, int).toChars(int).Result.empty() 1000000 53086148 8186704 8 pure nothrow @trusted char[] std.array.array!(std.conv.toChars!(10, char, 1, int).toChars(int).Result).array(std.conv.toChars!(10, char, 1, int).toChars(int).Result).__lambda2() 1000000 530599849 4867521 4 pure nothrow @safe immutable(char)[] std.conv.toImpl!(immutable(char)[], int).toImpl(int) 1000000 534801618 4201769 4 pure nothrow @safe immutable(char)[] std.conv.to!(immutable(char)[]).to!(int).to(int) 5888890 2520677 2520677 0 pure nothrow @nogc @safe void std.conv.toChars!(10, char, 1, int).toChars(int).Result.popFront() 1000000 2138919 2138919 2 pure nothrow @nogc @trusted char[] std.array.array!(std.conv.toChars!(10, char, 1, int).toChars(int).Result).array(std.conv.toChars!(10, char, 1, int).toChars(int).Result).__lambda3() 1000000 558232 558232 0 pure nothrow @property @nogc @safe ulong std.conv.toChars!(10, char, 1, int).toChars(int).Result.length()


You may not be used to read such a bare listing so here are some pointers:

- Num Calls is the number of calls made to that function
- Tree Time is the total cumulative time spent in the function and its subfunctions
- Func Time is the total time spent in the function only
- Per Call is the average time spent in the function only

The functions are sorted by their Func Time. As we can see what took the most time was converting the range of chars to an array in order to store it in a
string. This is a cost that we obviously don't have in h.

It seems that using snprintf would be the fastest here although one could replicate its behaviour in D. I don't think there is anything in the standard library that would really help here as (if I read it correctly) it is mainly because of the conversion from ranges to arrays that this code is slow.

Reply via email to