Re: Could we reserve void[T] for builtin set of T ?

Q. Schroll via Digitalmars-d Fri, 01 Apr 2016 01:59:24 -0700

On Thursday, 31 March 2016 at 19:57:50 UTC, Walter Bright wrote:

aa[x] = true;  // add member x
aa[x] = false; // remove member x
x in aa; // compile error


On Friday, 1 April 2016 at 02:36:35 UTC, Jonathan M Davis wrote:

Still, while it's true that aa.remove is how you'd normally doit, I think that Walter's suggestion of assigning true or falsemakes by far the most sense of the ones made thus far - and youcould just make
aa.remove(key);

and

aa[key] = false;

equivalent for void[T] to make it more consistent.

- Jonathan M Davis

The basic idea is great. But how could the details look like?What do we want in detail?

We want simple and suggestive operations on sets, likemodification of single elements, iteration, union, intersection,(relative) complement, cartesian product, pointwise functionapplication, filtering and much more. Maybe even suggestive setcomprehension is possible.


About the special case:

We encounter (yet another) special case for something built ofvoid. This is not a major deal. Everyone knows (should know) thatvoid is a special case nearly everywhere it emerges:

 • void returning functions.
 • void* is much different from any other pointer.
 • void[] and void[n] are different from usual arrays.
Now we add
 • void[T] is different from K[T] (for K != void).

From the view of a D programmer, this is not a big deal to accept.

From the view of a D learner, this is yet another void specialcase, maybe even easier to fully understand than void*.

First of all, we do not use the term associative array for sets.This is wrong and confusing at least for the beginners.

We can have AA declaration syntax without AA indexing syntax. Theset indexing syntax will be a bit different.


    void[T] s; // declare s as a set of T

    s.add(x);    // add    x of type T
    s.remove(x); // remove x of type T

auto r1 = x in s; // r1 is of type bool; r1 == true iff x is--- in s.auto r2 = x !in s; // r2 is of type bool; r2 == true iff x isnot in s.


Further we allow not only for convenience
    s[x] = true;  // Same side effect of add.
    s[x] = false; // Same side effect of remove.

This is not AA indexing syntax so let's call it set indexing andnothing else. It looks similar to bool[T] syntax, but a void[T]is different from bool[T] by design and idea.

The methods add and remove return bool values that indicate thestate being changed:

 • add    returns true iff key has not been already present.
 • remove returns true iff key has     been already present.

The opIndexAssign should return the assigned bool; everythingelse would be totally unexpected. It is a bit like assigning theexpression

    x in s
which is an rvalue.

It is illegal to use the opIndex with parameters. The only legalindexing expressions will be

 • s[]: legally used to operate on the set pointwise (see later).
 • s[x] = b

• s[x] op= b, where op is one of |, &, ^: s[x] op= b does s[x] =(x in s) op b.I don't see an application of the latter, but there is no reasonto disallow it; rather discourage it.

When assigning bool literals with set indexing syntax where thevalue of the assignment is not used, the compiler should emit awarning and suggest using add or remove respectively.


    bool b = expression();
    s[x] = true;         // compiler suggests using add.
    s[x] = false;        // compiler suggests using remove.
    s[x] = b;            // ok, b is not a literal.
    s[x] = expression(); // ok.

s[x] = s[y] = true; // ok; for s[y] = true, the value (true)is being used,// for s[x] we have s[y] = true asexpression.


Known from AAs we also will have
 • sizeof
 • length
 • dup
 • rehash
 • clear
in the expected form, but we won't have
 • keys
 • values
 • byKey()
 • byValue()

That is because to be it is odd to call the elements keys. Andwhat are the values then?

We don't even need these.

New/changed ones:
 • singelton (new)
 • get (known from AA, but other sematics)

For a singelton set, singelton returns the only element.Otherwise RangeError.get returns a pointer to the only element of a singelton set ornull for the empty set. If the set contains more than oneelement, RangeError. get can be useful with


    if (auto xp = s.get)
    { /+ use the unique element x = *p +/ }
    else
    { /+ empty set handling +/ }

[Aside: There is no add for AAs for a good reason.]

Iteration:

    foreach (ref x; s) // ref is optional!
    { ... }

Because the pseudo index type is void, there are no index values.So indexed iteration is illegal.

    foreach (i, ref x; s) // compile-time error.
    { ... }

Sets cannot be lockstepped over; that would need canoncial order.But it makes sense to chain sets etc.


Initialization:

The set can also be initialized by a value of T, T[] or bool[T].
    void[T] s;                       // makes empty set.

void[T] s = x; // x of type T: Makessingleton set.void[T] s = [ a, b ]; // x in s iff x == a or x ==b

    void[T] s = [ a:true, b:false ]; // x in s iff x == a.

If a set is initialized by an AA literal with only literal bools,the compiler should suggest using an array instead.

There is no need for a special set literal. We could allow
    void[T] s = [ a:, b: ];

The colon is from the AA literal syntax, with no value to the keyas void has no values.We could allow assigning any AA and taking the keys only. Thatwould make the usage of bool[T] inconsistent. It is better to use

    void[T] s = aa.keys;   // if aa of type S[T]
    void[T] s = aa.values; // if aa of type T[S]

We could allow further
    s[x, y] = b; // nothing different from s[x] = s[y] = b;
and
    void[T] t
    s[t] = b; // foreach (x; t) s[x] = b;

Set operations (oh, no... maths):

    void[int] s  = [ 0, 1 ];
    void[int] t  = [ 0,    2 ];

    void[int] u  = [ 0, 1, 2 ];   // s ∪ t
    void[int] i  = [ 0 ];         // s ∩ t
    void[int] d  = [ 1, 2 ];      // s Δ t
    void[int] m1 = [ 1 ];         // s \ t
    void[int] m2 = [ 2 ];         // t \ s

    assert (s | t == u);   // union
    assert (s & t == i);   // intersection
    assert (s ^ t == d);   // symmetric difference
    assert (s - t == m1);  // relative complement
    assert (s / t == m2);  // reverse relative complement

assert (s ~ t == u); // union (alias); compiler shouldsuggest using |.

Rationale for using bit/logic operators? Let op be one of |, & or^. Then

      x in (s op t) == (x in s) op (x in t)

This is very consistent with assigning bools in indexingexpressions.

Rationale for better not using ~ for sets at all: Sets aredifferent from arrays by idea.

It is fundamental for arrays that
    int[] a, b;
    assert ((a ~ b).length == a.length + b.length);
holds. For sets it does not.

Maybe this is interesting:
 • s[t] = true   does  s |= t.
 • s[t] = false  does  s -= t.

[I have thought about the disjoint union of sets. One problem isthe two meanings of this term. There seems not to be a perfectsolution to this issue.]


Union and intersection:

    void[void[T]] ss;       // set of sets.
    auto u = join(ss);      // u is of type void[T]
    auto i = intersect(ss); // i is of type void[T]

Also we have disjoint, that returns if either a single set ofsets has disjoint elements or when given more than one parameter,tests if the given sets are disjoint.

    auto r1 = disjoint(ss); // set of sets
    void[S] s;
    void[T] t;
    // Assume the comapaison x == y is legal for S x; T y;
    auto r2 = disjoint(s, t); // some sets

Pointwise application of a function:

In math there is f[s] for { f(x) | x in s }. We cannot use thisgreat notation directly. What can be done is

    auto r = pointwise(&f, s); // f.pointwise(s);
and/or
    auto r = pointwise(s, &f); // s.pointwise(f);

It is possible to distinguish sets and functions perfectly.

The option s[f] is open, but looks very odd.

Another way to go is
    struct pointwise(alias f)
    if (!is(Parameters!f == AliasSeq!()))
    {
        alias T    = Parameters!f[0];
        alias Args = Parameters!f[1 .. $];
        static opCall(T x, Args args)
        {
            return f(x, args);
        }
        static opIndex(void[T] s, Args args)
        {
            alias R = typeof(f(s.singelton));
            void[R] result;
            foreach (x; s)
                result.add(f(s, args));
            return result;
        }
    }

    alias f = pointwise!fBase;
    f(x, args); // simple application
    f[s, args]; // pointwise application on a set.

Further we should provide a mechanism for f[[S]] = { f[s] | s inS }, i.e.

    alias f = pointwise!(pointwise!fBase);
    f(x, args); // simple application
    f[ss, args]; // pointwise application on a set of sets.
    // pointwise application on a set not possible via f.

The main problem is that a function can be overloaded with
    R f(void[T] s, Args)
and
    R f(     T  x, Args)

It is possible but not desirable to make pointwise detect thesecases and forward opIndex to opCall. For this, there can be aseparate solution.


We can have pointwise operator application rather easily.
    void[S] s;
    void[T] t;

for any overloadable unary operator op: op s[] is like { op x | xin s }

    alias R = typeof(op s.singelton);
    void[R] r;
    foreach (x; s)
        r.add(op x);

for any overloadable binary operator op: s[] op t[] is like { xop y | x in s, y in t },

    alias R = typeof(s.singelton op t.singelton);
    void[R] r;
    foreach (x; s)
    foreach (y; t)
        r.add(x op y);

These can be computed lazily under certain circumstands, e.g. ifthe sets are immutable or in

    foreach (x; s[] + t[])
    { ... }
if s and t are not changed in the foreach body.

We can allow filtering a set by condition via
    auto t = s(predicate); // typeof(t) == typeof(s)

Set comprehension:

The mathematical notation is not unambiguous and partly informal.The minimal support of set comprehension is{ f(x1,..., xn) | x1 in s1,..., xn in sn,predicate1(x1,..., xn) ... predicatem(x1,..., xn) }We have an expression on the left, bindings and predicates on theright. Maybe one way is a template function

    setComp!("f(x1, ..., xn)",
             "x1 in s1, ..., xn in sn",

"predicate1(x1, ..., xn) ... predicatem(x1, ...,xn)")

with predicates optional.


One thing I have left out? Yes, the complement.
    void[T] s;
    void[T] t = ~s;

We can decide, wether the complement is referentially bound tothe original or not. This is an issue but not the major one.Let's say no, use a copy and go on.As some types are infinite, we must simulate complements. Theeasiest one is having a private bool isComplement to indicatewether the content semantically is the set or the complement. Fora complement, the operations

 • x in s
 • s[x] = b
 • s | t, s & t, s ^ t, etc.
can be emulated perfectly.

The great issue is iteration over complements. One solution ismaking complements another type, because being able to iterate aset just sometimes is inacceptable. For some types of sets, likevoid[bool] which has the four values [ ], [ false ], [ true ], [false, true ], complements are easy. This holds for all finitetypes, but is impracticable for lange ones like long. It makessense to have special treatment for small enums. In addition, oneof D's goals is to fit user defined types best into the language.We need an interface for user defined types that tell that thecomplement operation on sets of them is a meaningful thing andnot just a coated set.

Re: Could we reserve void[T] for builtin set of T ?

Reply via email to