This is the first part of a function to convert to base 58 (some letters are missing, like the upper case "I") used in the Bitcoin protocol:

alias Address = ubyte[1 + 4 + RIPEMD160_digest_len];

char[] toBase58(ref Address a) pure nothrow @safe {
    static immutable symbols = "123456789" ~
                               "ABCDEFGHJKLMNPQRSTUVWXYZ" ~
                               "abcdefghijkmnopqrstuvwxyz";
    static assert(symbols.length == 58);

    auto result = new typeof(return)(34);
    foreach_reverse (ref ri; result) {
        uint c = 0;
        foreach (ref ai; a) {
            c = c * 256 + ai;
            ai = cast(ubyte)(c / symbols.length);
            c %= symbols.length;
        }
        ri = symbols[c];
    }
    ...
}


The D type system isn't smart enough to see that "ai" is always fitting in an ubyte, so I have had to use a cast(ubyte). But casts are dangerous and their usage should be minimized, and to!ubyte is slow and makes the function not nothrow. So I've rewritten the code like this with a bit of algebraic rewriting:


char[] toBase58(ref Address a) pure nothrow @safe {
    static immutable symbols = "123456789" ~
                               "ABCDEFGHJKLMNPQRSTUVWXYZ" ~
                               "abcdefghijkmnopqrstuvwxyz";
    static assert(symbols.length == 58);

    auto result = new typeof(return)(34);
    foreach_reverse (ref ri; result) {
        uint c = 0;
        foreach (ref ai; a) {
            immutable d = (c % symbols.length) * 256 + ai;
            ai = d / symbols.length;
            c = d;
        }
        ri = symbols[c % symbols.length];
    }
    ...
}


Now it can be a little slower because the integer division and modulus has different divisors, so perhaps they can't be implemented with a little more than a single division, as before (I have not compared the assembly), but for the purposes of this code the performance difference is not a problem. Now the D type system is able to see that "ai" is always fitting in a ubyte, and there's no need for a cast. The compiler puts a safe implicit cast. This is awesome.

- - - - - - - - - - - - - -

But of course you often want more :-)

This is another case where the current D type system allows you to avoid a cast:

void main() {
    char['z' - 'a' + 1] arr;

    foreach (immutable i, ref c; arr)
        c = 'a' + i;
}



But if you want to use ranges and functional UFCS chains you currently need the cast:


void main() {
    import std.range, std.algorithm, std.array;

    char[26] arr = 26
                   .iota
                   .map!(i => cast(char)('a' + i))
                   .array;
}


In theory this program has the same compile-time information as the foreach case. In practice foreach is a built-in that enjoys more semantics than a iota+map.

Currently iota(26) loses the compile-time information about the range, so you can't do (note: the "max" attribute doesn't exists):

void main() {
    import std.range: iota;
    auto r = iota(26);
    enum ubyte m = r.max;
}


Currently the only way to keep that compile-time information is to use a template argument:

void main() {
    import std.range: tIota;
    auto r = tIota!26;
    enum ubyte m = r.max; // OK
}

But even if you write such tIota range, the map!() will lose the compile-time value range information. And even if you manage to write a map!() able to do it with template arguments, you have template bloat.

So there's a desire to manage the compile-time information (like value range information) of (number) literals without causing template bloat and without the need to use explicit template arguments.

Bye,
bearophile

Reply via email to