Struct-typing in Phobos

Chris Williams Mon, 03 Feb 2014 18:06:44 -0800

I've been working on adding dcrypt to Phobos, which meantconsidering what to do in terms of taking it easy and simplypushing it in as-is, or converting all of the classes andinterfaces into structs with type checkers (I'll be doing thelatter). But it made me consider the pros and cons of the twooptions.


Basically, it seemed to come down to:


Pro-structs
1. Smaller, faster
2. Adds unique D-ish flavor to the standard library

Anti-structs

1. Difficult to read. E.g. a beginning programmer needs to beable to look at this and understand it before he can use thelibrary:

size_t insertBack(Stuff)(Stuff stuff) if(isImplicitlyConvertible!(Stuff, T) || isInputRange!Stuff &&isImplicitlyConvertible!(ElementType!Stuff, T));

And of course, even being able to read it, one would still needto track down isInputRange() and isImplicitlyConvertible() toknow what they do.

2. When using classes and interfaces, type checking is as simpleas writing the type to be passed in as a parameter. Withstruct-typing, people need to manually append a series of checksall over the place, which results in things like the following:

auto uniform(T, UniformRandomNumberGenerator)(refUniformRandomNumberGenerator urng) if (!is(T == enum) &&(isIntegral!T || isSomeChar!T));

Even though isUniformRNG() exists and is used on otherimplementations of uniform(), it isn't used here, because whoeverdeveloped the function forgot to include it on this particularfunction.

3. Non-extensible. To "extend" an SList or whatever, I would haveto write a wrapper class with methods that forward to the SListmethods, if I wanted my object to be able to interoperate withPhobos, or I would need to modify Phobos so that the body ofSList was a template that could be mixed in (assuming I didn'twant to override any methods, just add new ones).

4. Unclear how great an advantage the smaller/faster aspectactually gives us relative to the demerits of this style ofcoding. For example, using code from the below writeup (seebelow), I tested the performance of inserting 100,000 items thenremoving them all 100 times with both the interface/class andstruct versions, for a total time of 1905±4ms / 1930±10ms (with aslightly smaller test the struct won, suggesting that there's noreal difference). My suspicion is that the compiler/linker wasdetecting that there was only a single implemntation with nosubclasses, hence it compiled out to something roughly equivalentwith no vtable, so I split class implementation of HashSet intotwo, with an abstract base class that contained the "data"variable and nothing else, with all the methods declared in thesubclass and that bumped the runtime to 1980±10ms. But stillthat's only a 2.5% difference in speed and only if one goesgung-ho with layering classes.

And for many objects - like random generators or HashSets - youaren't going to have masses of instances of the same type, justone top-level instance that internally contains basic data types(like structs), so there likely won't be much of a memorydifference for most applications either.

Personally, I think that a better method of type-specializationneeds to be added to the language (e.g. allowing isX!T statementsto be used as types where we write types rather than preprocessorfunctions or allowing structs to implement interfaces) so thatpost-fixed "if" statements on function declarations are more of alast-ditch effort, rather than the norm. Though as pointed out,it might not be worth it unless someone can point out trulysignificant performance advantages.

But certainly for the moment, at least having an article aboutthese things on the top site seems advisable, given howfrequently we see them in the standard library. Plus, users andPhobos developers might want to have a quick tutorial to maketheir own.


So here's a quick write-up:

----

The core library of a language should be small, fast, powerful,and generic. This last item, however, causes a problem forlibrary architects. Traditionally, the more generic something is,the larger and slower it is - with several layers of abstractionand indirection to allow many disparate implementations of ageneric idea to all serve under the same interface. D supportsall of the features that would allow one to write a library likethis, but it also supports features that allow one to write alibrary which does not sacrifice performance for generality.

For example, here would be a standard declaration of a Setinterface and a simple implementation as a class:


interface Set(T) {
    size_t length();
    size_t insert(T);
    T front();
    T back();
    void removeFront();
    void removeBack();
}

class HashSet(T) : Set!(T) {
private:
    void[0][T] data;

public:
    size_t length() {
        return data.length;
    }

    size_t insert(T value) {
        data[value] = [];
        return data.length;
    }

    T front() {
        foreach (k, v; data) {
            return k;
        }
        return T.init;
    }

    T back() {
        foreach_reverse (k, v; data) {
            return k;
        }
        return T.init;
    }

    void removeFront() {
        if (data.length == 0) return;

        T key;
        foreach (k, v; data) {
            key = k;
            break;
        }
        data.remove(key);
    }

    void removeBack() {
        if (data.length == 0) return;

        T key;
        foreach_reverse (k, v; data) {
            key = k;
            break;
        }
        data.remove(key);
    }
}

When we write a function which accepts a Set, we can write:

void foo(Set someSet) {
    someSet.insert(a);
    writeln(someSet.front);
}

This will accept and operate on any class which implements theSet interface (as you would expect).

To reduce overhead, in the D standard library, it is standard totry and implement this as a struct instead of as a class. But ifwe implement HashSet as a struct (which cannot implement aninterface), then we would traditionally have no way to writefunctions which can accept generic types, like Sets.

This is solved firstly by use of the templating system. If I havea function which accepts a templated type, then so long as thecode compiles once the type has been resolved, any arbitraryclass or struct which implements the correct functions willcompile and operate correctly.

void foo(Set)(Set someSet) { // "Set" is just a templated typenamesomeSet.insert(a); // Only compiles if Set resolves to a typewith insert and front defined

    writeln(someSet.front);
}

In a sense, this is a method for duck-typing in D, though not avery good one since if we look at the function declaration, it isnot clear what types are or are not valid as parameters to foo().We must look through the implementation of foo() to find useagesof someSet to determine what is required. Likewise, there isnothing like "interface Set" which is mandating a standardizedset of methods that all of our functions can expect and whichdevelopers of new implementations of Set types can know toimplement.

This is corrected by use of compile-time duck-type checkingtemplates. First we declare a template which implements a statictest of compilability of the type for our chosen interface.


template isSet(Set) {
    enum bool isSet = is(typeof(
    (inout int = 0)
    {
        Set!int s = void; // can define an instance
        if (s.length == 0) {} // can test size
        s.insert(1); // can insert
        auto h = s.front; // can access front
        h = s.back; // can access back
        s.removeFront; // can remove front
        s.removeBack; // can remove back
    }));
}

Following this, we first want to validate that our implementationis compliant.


struct HashSet(T) {
    static assert( isSet!(HashSet!T) );
    ...
}

We can then upgrade foo() to also require only types whichconform:


void foo(Set)(Set someSet) if (isSet!Set) {
    ...
}

While more verbose, this sort of duck-typing offers advantagesbeyond just speed. As it tests only 'compilability', options openup to allow flexibility in your definitions. One such example,the random number generators in std.random are defined so thatthey can be treated like InputRange types. Any InputRange isexpected to return whether it is empty or not, but as a randomnumber generator never becomes empty, in std.random the "empty"value is a constant (enum) attribute rather than a method.

Another similar benefit is the ability to allow undefined returntypes. For example, asymmetric cryptography algorithms shouldalways generate Public and Private keys, but the values thatcomprise those keys is dependent on the algorithm used.RSAPublicKey and ElGamalPublicKey are basic data structures, notactionable classes. There is no reason to have a shared baseclass or common interface. With D's template-based duck-typing,one can validate the existence of public/private key accessors(using "auto" to hold the return values), without caring whetherthe return type for different implementations have the same oreven interchangeable types.

Struct-typing in Phobos

Reply via email to