On Thursday, 31 March 2016 at 19:57:50 UTC, Walter Bright wrote:
aa[x] = true;  // add member x
aa[x] = false; // remove member x
x in aa; // compile error

On Friday, 1 April 2016 at 02:36:35 UTC, Jonathan M Davis wrote:
Still, while it's true that aa.remove is how you'd normally do it, I think that Walter's suggestion of assigning true or false makes by far the most sense of the ones made thus far - and you could just make

aa.remove(key);

and

aa[key] = false;

equivalent for void[T] to make it more consistent.

- Jonathan M Davis

The basic idea is great. But how could the details look like? What do we want in detail?

We want simple and suggestive operations on sets, like modification of single elements, iteration, union, intersection, (relative) complement, cartesian product, pointwise function application, filtering and much more. Maybe even suggestive set comprehension is possible.

About the special case:

We encounter (yet another) special case for something built of void. This is not a major deal. Everyone knows (should know) that void is a special case nearly everywhere it emerges:
 • void returning functions.
 • void* is much different from any other pointer.
 • void[] and void[n] are different from usual arrays.
Now we add
 • void[T] is different from K[T] (for K != void).

From the view of a D programmer, this is not a big deal to accept.
From the view of a D learner, this is yet another void special case, maybe even easier to fully understand than void*.

First of all, we do not use the term associative array for sets. This is wrong and confusing at least for the beginners.

We can have AA declaration syntax without AA indexing syntax. The set indexing syntax will be a bit different.

    void[T] s; // declare s as a set of T

    s.add(x);    // add    x of type T
    s.remove(x); // remove x of type T

auto r1 = x in s; // r1 is of type bool; r1 == true iff x is --- in s. auto r2 = x !in s; // r2 is of type bool; r2 == true iff x is not in s.

Further we allow not only for convenience
    s[x] = true;  // Same side effect of add.
    s[x] = false; // Same side effect of remove.

This is not AA indexing syntax so let's call it set indexing and nothing else. It looks similar to bool[T] syntax, but a void[T] is different from bool[T] by design and idea.

The methods add and remove return bool values that indicate the state being changed:
 • add    returns true iff key has not been already present.
 • remove returns true iff key has     been already present.

The opIndexAssign should return the assigned bool; everything else would be totally unexpected. It is a bit like assigning the expression
    x in s
which is an rvalue.

It is illegal to use the opIndex with parameters. The only legal indexing expressions will be
 • s[]: legally used to operate on the set pointwise (see later).
 • s[x] = b
• s[x] op= b, where op is one of |, &, ^: s[x] op= b does s[x] = (x in s) op b. I don't see an application of the latter, but there is no reason to disallow it; rather discourage it.

When assigning bool literals with set indexing syntax where the value of the assignment is not used, the compiler should emit a warning and suggest using add or remove respectively.

    bool b = expression();
    s[x] = true;         // compiler suggests using add.
    s[x] = false;        // compiler suggests using remove.
    s[x] = b;            // ok, b is not a literal.
    s[x] = expression(); // ok.
s[x] = s[y] = true; // ok; for s[y] = true, the value (true) is being used, // for s[x] we have s[y] = true as expression.

Known from AAs we also will have
 • sizeof
 • length
 • dup
 • rehash
 • clear
in the expected form, but we won't have
 • keys
 • values
 • byKey()
 • byValue()
That is because to be it is odd to call the elements keys. And what are the values then?
We don't even need these.

New/changed ones:
 • singelton (new)
 • get (known from AA, but other sematics)
For a singelton set, singelton returns the only element. Otherwise RangeError. get returns a pointer to the only element of a singelton set or null for the empty set. If the set contains more than one element, RangeError. get can be useful with

    if (auto xp = s.get)
    { /+ use the unique element x = *p +/ }
    else
    { /+ empty set handling +/ }

[Aside: There is no add for AAs for a good reason.]

Iteration:

    foreach (ref x; s) // ref is optional!
    { ... }
Because the pseudo index type is void, there are no index values. So indexed iteration is illegal.
    foreach (i, ref x; s) // compile-time error.
    { ... }
Sets cannot be lockstepped over; that would need canoncial order. But it makes sense to chain sets etc.

Initialization:

The set can also be initialized by a value of T, T[] or bool[T].
    void[T] s;                       // makes empty set.
void[T] s = x; // x of type T: Makes singleton set. void[T] s = [ a, b ]; // x in s iff x == a or x == b
    void[T] s = [ a:true, b:false ]; // x in s iff x == a.
If a set is initialized by an AA literal with only literal bools, the compiler should suggest using an array instead.
There is no need for a special set literal. We could allow
    void[T] s = [ a:, b: ];
The colon is from the AA literal syntax, with no value to the key as void has no values. We could allow assigning any AA and taking the keys only. That would make the usage of bool[T] inconsistent. It is better to use
    void[T] s = aa.keys;   // if aa of type S[T]
    void[T] s = aa.values; // if aa of type T[S]

We could allow further
    s[x, y] = b; // nothing different from s[x] = s[y] = b;
and
    void[T] t
    s[t] = b; // foreach (x; t) s[x] = b;

Set operations (oh, no... maths):

    void[int] s  = [ 0, 1 ];
    void[int] t  = [ 0,    2 ];

    void[int] u  = [ 0, 1, 2 ];   // s ∪ t
    void[int] i  = [ 0 ];         // s ∩ t
    void[int] d  = [ 1, 2 ];      // s Δ t
    void[int] m1 = [ 1 ];         // s \ t
    void[int] m2 = [ 2 ];         // t \ s

    assert (s | t == u);   // union
    assert (s & t == i);   // intersection
    assert (s ^ t == d);   // symmetric difference
    assert (s - t == m1);  // relative complement
    assert (s / t == m2);  // reverse relative complement

assert (s ~ t == u); // union (alias); compiler should suggest using |.

Rationale for using bit/logic operators? Let op be one of |, & or ^. Then
      x in (s op t) == (x in s) op (x in t)
This is very consistent with assigning bools in indexing expressions.

Rationale for better not using ~ for sets at all: Sets are different from arrays by idea.
It is fundamental for arrays that
    int[] a, b;
    assert ((a ~ b).length == a.length + b.length);
holds. For sets it does not.

Maybe this is interesting:
 • s[t] = true   does  s |= t.
 • s[t] = false  does  s -= t.

[I have thought about the disjoint union of sets. One problem is the two meanings of this term. There seems not to be a perfect solution to this issue.]

Union and intersection:

    void[void[T]] ss;       // set of sets.
    auto u = join(ss);      // u is of type void[T]
    auto i = intersect(ss); // i is of type void[T]
Also we have disjoint, that returns if either a single set of sets has disjoint elements or when given more than one parameter, tests if the given sets are disjoint.
    auto r1 = disjoint(ss); // set of sets
    void[S] s;
    void[T] t;
    // Assume the comapaison x == y is legal for S x; T y;
    auto r2 = disjoint(s, t); // some sets

Pointwise application of a function:

In math there is f[s] for { f(x) | x in s }. We cannot use this great notation directly. What can be done is
    auto r = pointwise(&f, s); // f.pointwise(s);
and/or
    auto r = pointwise(s, &f); // s.pointwise(f);

It is possible to distinguish sets and functions perfectly.

The option s[f] is open, but looks very odd.

Another way to go is
    struct pointwise(alias f)
    if (!is(Parameters!f == AliasSeq!()))
    {
        alias T    = Parameters!f[0];
        alias Args = Parameters!f[1 .. $];
        static opCall(T x, Args args)
        {
            return f(x, args);
        }
        static opIndex(void[T] s, Args args)
        {
            alias R = typeof(f(s.singelton));
            void[R] result;
            foreach (x; s)
                result.add(f(s, args));
            return result;
        }
    }

    alias f = pointwise!fBase;
    f(x, args); // simple application
    f[s, args]; // pointwise application on a set.

Further we should provide a mechanism for f[[S]] = { f[s] | s in S }, i.e.
    alias f = pointwise!(pointwise!fBase);
    f(x, args); // simple application
    f[ss, args]; // pointwise application on a set of sets.
    // pointwise application on a set not possible via f.

The main problem is that a function can be overloaded with
    R f(void[T] s, Args)
and
    R f(     T  x, Args)
It is possible but not desirable to make pointwise detect these cases and forward opIndex to opCall. For this, there can be a separate solution.

We can have pointwise operator application rather easily.
    void[S] s;
    void[T] t;

for any overloadable unary operator op: op s[] is like { op x | x in s }
    alias R = typeof(op s.singelton);
    void[R] r;
    foreach (x; s)
        r.add(op x);

for any overloadable binary operator op: s[] op t[] is like { x op y | x in s, y in t },
    alias R = typeof(s.singelton op t.singelton);
    void[R] r;
    foreach (x; s)
    foreach (y; t)
        r.add(x op y);

These can be computed lazily under certain circumstands, e.g. if the sets are immutable or in
    foreach (x; s[] + t[])
    { ... }
if s and t are not changed in the foreach body.

We can allow filtering a set by condition via
    auto t = s(predicate); // typeof(t) == typeof(s)

Set comprehension:

The mathematical notation is not unambiguous and partly informal. The minimal support of set comprehension is { f(x1,..., xn) | x1 in s1,..., xn in sn, predicate1(x1,..., xn) ... predicatem(x1,..., xn) } We have an expression on the left, bindings and predicates on the right. Maybe one way is a template function
    setComp!("f(x1, ..., xn)",
             "x1 in s1, ..., xn in sn",
"predicate1(x1, ..., xn) ... predicatem(x1, ..., xn)")
with predicates optional.


One thing I have left out? Yes, the complement.
    void[T] s;
    void[T] t = ~s;
We can decide, wether the complement is referentially bound to the original or not. This is an issue but not the major one. Let's say no, use a copy and go on. As some types are infinite, we must simulate complements. The easiest one is having a private bool isComplement to indicate wether the content semantically is the set or the complement. For a complement, the operations
 • x in s
 • s[x] = b
 • s | t, s & t, s ^ t, etc.
can be emulated perfectly.

The great issue is iteration over complements. One solution is making complements another type, because being able to iterate a set just sometimes is inacceptable. For some types of sets, like void[bool] which has the four values [ ], [ false ], [ true ], [ false, true ], complements are easy. This holds for all finite types, but is impracticable for lange ones like long. It makes sense to have special treatment for small enums. In addition, one of D's goals is to fit user defined types best into the language. We need an interface for user defined types that tell that the complement operation on sets of them is a meaningful thing and not just a coated set.

Reply via email to