[IndexedUnion] (new) New topic: IndexedUnion

wyscc Sun, 15 Jul 2007 14:34:02 -0700

Changes http://wiki.axiom-developer.org/IndexedUnion/diff
--
Goal: To explore the issues and develop tools for implementing 'IndexedUnion', 
a new domain constructor proposed by William Sit.


Discussion: The idea of creating a new domain 'IndexedUnion' in Axiom grows out 
of a discussion on whether the current 'Union', a primitive type that allows 
two versions, tagged and untagged, should be generalized to allow a mixed 
tagged/untagged version. The current implementation specifies that in the 
tagged version, all domains forming the union must be distinct, while in the 
untagged version, the tags must be distinct. The compiler does not enforce 
these restrictions. Also, in the tagged case, the exported functions that 
construct elements from component domains have identical signatures if the 
domains are the same, making it only possible to create elements from the first 
of such identical domains. 

\begin{axiom}
)show Union(a:Integer, b:Integer)
\end{axiom}

Notice that there are two 'construct: Integer->%'. The proposed 'IndexedUnion' 
domain will be more general, and will avoid such problems. Unfortunately, there 
are a number of hurdles and an examination of these gives insight into the 
original design.


Contributors: Ralph Hemmecke, Bill Page, William Sit, Stephen Wilson (feel free 
to add yours; this page is open source)

The construction of 'Union' (in Axiom) as implementing the notion of coproduct 
or external direct sum in Category Theory,
specifically as disjoint union in Set Theory and taking place in the category 
of sets, should have nothing to do with tags, but with an
*index set* and the sets associated with the indices. The notions of union, 
coproduct, and colimit are distinct concepts in
category theory, but let's not get too involved with these for the time being. 
Since in Axiom, sets are domains of 'SetCategory', and an indexed family of 
sets is really a mapping from the indexed set to the category of sets, 
accordingly, it is proposed to create an alternative::

   IndexedUnion(I: SetCategory, g:I -> SetCategory):SetCategory 

   Example 1: 
   I:= Integer 
   g(i:I):SetCategory == if even? i then Integer else String 
   J:= IndexedUnion(I,g) 

A more mathematical example would be to construct the prime field if 'i' is 
prime, and a product of prime fields over distinct prime factors of 'i' if 'i' 
is
composite. The disjoint union of $n$ copies of ${\mathbb N}^n$ can be used to 
represent all partial derivatives of $n$ dependent functions with respect to 
$m$ independent variables.  In the next example, we construct the set of all 
square matrices with integer entries::

   Example 2: 
   I:=PositiveInteger 
   g(i:I):SetCategory == SquareMatrix(i, Integer) 
   J:=IndexedUnion(i,g) 

Technically, 'g' here is a domain constructor. Whether the implementation 
should be done as primitive or not depends on the language
limitation of Spad. As we shall see, at the Spad level, we would need domains 
and categories to be first class objects and maps
would need to allow input-dependent types as its output. So Axiom may not allow 
that. 

As it currently exists, 'Union' (in Axiom) can form the disjoint union of a 
*finite* number of sets (each of which is a domain of
'SetCategory', NOT a finite set of elements in 'Set S' for some domain 'S'). 
The index set in the tagged version is
the set of tags, but this is clearly not enough (there being no useful 
operations in the set of tags). The index set in the untagged
version probably defaults to ${0, ..., n-1}$ where $n$ is the number of sets 
(domains) in the union, counted with multiplicities. 'Union' is a primitive 
type and its code lives below the algebra level; apparently, 'Rep' for 'Union' 
is of the form '(i.x)' where 'i' is an integer ranging from $0$ to $n-1$ and 
'x' is the value in the domain from the $(i+1)^{\rm st}$ argument of 'Union'. 
This is independent of whether the domains are tagged or not. The tags seem to 
live in hard-coded signatures of exports when the domain is instantiated. For 
example:

\begin{axiom}
)show Union(a:Integer, b:String)
\end{axiom}

We already noted the two 'construct' (here they are distinguishable, but a 
clumsy way if there are more domains than two). One important thing to note, 
however, is the signature for two 'case' and the 'dot' functions.
These signatures in tagged 'Union' are *outrageous* (abuse of language)! 
Neither identifier 'a' 
nor identifier 'b' is a domain! The implementers again finessed the difficult 
problems because they are working below the algebra layer. They are more 
pragmatic than you think and suggest that the implementation for 'IndexedUnion' 
most likely would have to be abusing the language too.  As long as one can be 
abusing the language (or rather, there is no language to abuse at pre-algebra 
time!), why not distinguish the two 'construct' also by 'a' and 'b' (instead of 
just by the domains, which could be identical)? Something like::

   construct: (Integer, a) -> %
   construct: (Integer, b) -> %

Note here that the tags 'a' and 'b' are local and not entirely accessible! The 
tags *may* be shadowed by a global variable of the same name. The following 
example shows the poor design of the exported functions in tagged 'Union' 
(there are also problems with the untagged 'Union', you can find out the 
problems as an exercise).

\begin{axiom}
test:= Union(a:String, b:String)
c:test:="abc"
c case b
-- c.b  would failed with an error
b:Integer:=3
c case b
c.b
c case a
c.a
a:String:="d"
c case a
c.a
\end{axiom}

The convenience of tags (in the case of a small, finite index set) is that the 
user does not have to make the set of tags into a
domain (as required to become the source for the function 'g') and the tags are 
supposedly more meaningful to the user. Now, is a hybrid 'Union' type 
desirable? Example 6 below will illustrate a possible use. 
The "mixed" 'Union' where tags are allowed for a small finite subset of the 
index set would allow the best of both
worlds. Unfortunately, tags are identifiers (that is, symbols or effectively, 
strings, almost by definition) whereas there is no such restriction on an index 
set (or the rest of it). In most
cases, for the index set to remain a domain, the indices must be homogeneous in 
type and so if there are tags, then the index set
should be a subset of the set of all strings. In the finite case, it would be 
an *element* of 'Set String' (and thus, not a *domain* of 'SetCategory').  But 
'Union' should allow infinite index sets as well and an infinite set of strings 
is difficult to manipulate in code. 

One plausible reason why 'Union' is implemented as a primitive rather than at 
the algebra level is to avoid having to require a user at
the algebra level to make the index set into a *domain*.  To allow a hybrid 
index set, it would be again simpler to implement as a
primitive than at the algebra level, because requiring the user to make the 
hybrid index set into a domain
would be a bit harder and less intuitive. This, however, can be done. We need a 
domain constructor::

   TagsAsDomain( T: Set String): SetCategory 

Note that 'Set String' is used, not 'List String' (to avoid duplicates). This 
type of construction is already used in Axiom, as in the domain '"failed"' when 
we
use 'Union("failed", ...)'. So it is probably handled specifically by the 
compiler, or there is a similar function at lower level. The above
specification would enforce automatically the constraints Spad is currently 
specifying: In untagged 'Union', the domains must be distinct. In tagged 
'Union', the tags must be distinct::

  Example 3: 
  I:= TagsAsDomain({"a", "b", "c"}) 
  -- this would be {"a","b","c"} as a domain 
  g(i:I):SetCategory == 
    i = "a" => A 
    i = "b" => B 
    i = "c" => C 
  J:=IndexedUnion(I,g) 

Clearly this is a bit awkward at the algebra level, compared to:: 

  Union(a:A, b:B, c:C) 

But a compiler can easily translate the above into the detailed algebra code in 
Example 3. On the other hand, in this algebraic level
version, there is no "untagged" 'Union' (all domains in 'IndexedUnion' are 
indexed) and the "untagged" version should not require any
constraints such as the associated domains be distinct. It is only required 
that the *elements* of the index set be distinct, which is
automatically enforced because it is a set. Indeed, hybrid union indexed by a 
heterogeneous set can be handled easily too, using
'IndexedUnion' itself::

  Example 4: 
  I:=IntegerMod(2) 
  L:=TagsAsDomain({"a","b","c"}) 
  g(i:I):SetCategory == 
    zero? i => L 
    IntegerMod(5) 
  J:= IndexedUnion(I,g) 
  f(j:J):SetCategory == ... 
  K:= IndexedUnion(J, f) 

Again, it would be much easier to have this implemented enumeratively and 
perhaps as a primitive and allow the user to simply
write, in case the index set is finite and small::

  Union(a:A,b:B,c:C,D,E,F,G,H) 

However, the more abstract specification of 'IndexedUnion' is far more 
powerful:: 

  Example 5: 

  I:=IntegerMod(2) 
  L:=TagsAsDomain({"a","b","c"}) 
  g(i:I):SetCategory == 
    zero? i => L 
    String 
  J:= IndexedUnion(I,g) 
  f(j:J):SetCategory == ... 
  K:= IndexedUnion(J, f) 

In the definition of 'f', we can actually distinguish the '"a"' in 'L' from the 
'"a"' in 'String' because each element of 'IndexedUnion' is
"tagged" and we will use the '"="' from 'L' or from 'String' (this is related 
to the scoping issue of current implementation raised by
Gaby). Since 'J' in Example 5 is itself an 'IndexedUnion' object, this brings 
us to the "case" question, or more generally: 

What should be exported from 'IndexedUnion'? 

Clearly, there needs to be a way to coerce elements from the domains into the 
union. If we were to adopt Bill Page's idea that the
injections (and projections for 'Product' or 'Record') should be named by tags, 
then it would be very clumsy to implement them when
the 'Union' is over a large finite set and impossible over an infinite set.  
This may be one reason why Axiom separates 'Union' into two
versions, tagged (when the indexed set is finite and small) and untagged (the 
general case).  There are only two purposes of
tags: (1) to distinguish cases when a domain occurs multiple times in the 
arguments of 'Union', and (2) for convenience of use.  The
first of these can be handled by the index set and the second, too, but 
convenience depends on the size of the index set, as
illustrated in the previous examples. 

Based on the notion of constructing 'IndexedUnion' using an index set, and the 
poor handling of tags in the current implelmentation for 'Union', the following 
exports are proposed (there should be more if one
compares this with 'IndexedAggregate' or other 'Indexed*' constructors; they 
may be important in terms of implementation, for example, hash functions for 
managing memory usage)::

   IndexedUnion(I: SetCategory, g:I ->SetCategory):SetCategory == with 
      "=": (%, %) -> Boolean 
      index: % -> I 
         ++ index(x) returns the index i such that x belongs to g(i) 
      value(x:%): g(index(x)) 
         ++ index(x) returns x as an element in g(index(x)) 
      coerce(i:I, x:g(i)): % 
         ++ coerce(i,x) creates x as an element of % in the component g(i) 
  

These are just generalization of what currently exist but notice 'case' is 
removed and replaced by 'index' and 'value'.  Instead of 
multiple overloading of 'case' and 'coerce', we now need only one (this 
afterall, is exactly the advantage of indexing over explicit
enumeration). Of course, we need to be able to use input-dependent signatures. 
This is a requirement when we deal with
heterogeneous objects. In Example 5, where 'J' itself is defined using 
'IndexedUnion', the function 'f' may be defined in the following
way. The example illustrates how 'IndexedUnion' may be used:: 

  Example 6 (Example 5 continued): 

  I:=IntegerMod(2) 
  L:=tagsAsDomain({"a","b","c"}) 
  g(i:I):SetCategory == 
    zero? i => L 
    String 
  J:= IndexedUnion(I,g) 
  f(j, J) :SetCategory == 
    jU := value(j) 
    index(j) =$I 0 => 
       jU =$L "a" => A 
       jU =$L "b" => B 
       jU =$L "c" => C 
    IntegerMod(#jU) 
  K:=IndexedUnion(J,f) 

Of course this example is contrived. However, one can imagine constructions in 
mathematics over a set 'J' where for a finite number of
(exception) elements of 'J', there are two cases whereas for every other there 
is only one. This would necessitate exceptional
handling as in Example 6 above. 

What about the data representation of 'IndexedUnion'? Can we do this at the 
algebra level? We seem to need non-homogeneous data
representation! In lower level, this is not difficult, say using pointers 
(perhaps another reason why 'Union' is implemented as
primitive, to hide pointers at the algebra level), but at the algebra level, 
Axiom is not designed for heterogeneous objects (that's why
people use C++). A plausible representation (which is just like '(i.x)' 
discussed earlier) is::

    Rep:= Record( index: I, value:g(i) ) 

if allowed at the algebra level, would be perfect. (Note that 'Record', 
categorically the
product construction, should also be done using an index set.) A plausible and 
trivial implementation of the exported functions is:: 

      x = y == (x.index = y.index) and (x.value = y.value) 
      index(x) == x.index 
      value(x)== x.value 
      coerce(i:I, v:g(i)) == [i, v]$Rep 
  
By generalizing 'TagsAsDomain', replacing 'String' by any domain, we can create 
arbitrarily complicated index sets to use in 'IndexedUnion'.

Below, we have test code to implement 'Domain' (but called 'MyDomain' to avoid 
possible confusion with the actual 'Domain') and for 'ListAsDomain' 
'SetAsDomain' and 'StreamAsDomain'.  To facilitate discussion, line references 
are in-lined as comments::


\begin{axiom}
)abbrev domain MYDOM MyDomain 
++ Author: William Sit
++ Date Created: July 15, 2007
++ Description: This domain attempts to be the Domain of all domains
++   to allow for construction such as Union which are based on an index set. 
++   The family of domains can then be specified by a map from the index 
++   set to this domain. The domain is needed in order for a list of domains
++   (even finite) to be typed.
-- *** Warning: Design may change and limitation in use applies.
MyDomain():Spec == Imp where                                         -- 1
   Spec == Join(SetCategory, CoercibleTo(OutputForm)) with           -- 2
      coerce: String -> %                                            -- 3
      coerce: SExpression -> %                                       -- 4
   Imp == add                                                        -- 5
      Rep:= SExpression                                              -- 6
      x = y == ([EMAIL PROTECTED]) = ([EMAIL PROTECTED])                        
             -- 7
      coerce(x:%):OutputForm == [EMAIL PROTECTED] pretend OutputForm            
 -- 8
      coerce(x:String):% == convert(x::Symbol)$Rep pretend %         -- 9
      coerce(x:SExpression):% == x pretend %                         --10
\end{axiom}

We declared 'MyDomain' as a *domain* constructor (line 1) when it may be argued 
that a *category* constructor is more appropriate. But we are only in the 
exploration stage. The crash that we shall see may be due to this though.  We 
intend objects in this domain will be constructed from 'String' (line 3) or 
'SExpression' (line 4), such as '"Integer"' ('String') or '(Integer)' 
('SExpression').  We use 'SExpression' for data representation because at this 
time, we don't know what better domain to use! We already anticipate difficulty 
ahead (at least at the algebra level) to turn an 'SExpression' into an actual 
domain. There are differences.

\begin{axiom}
p:=POLY FRAC INT
a:p:= 2*x+1
q:=dom(sample()$p)
b:q:= 2*x + 1
\end{axiom}

Here 'p' is a genuine *domain* whereas its 'SExpression' 'q' is simply a nested 
list of lisp symbols. We need to be able to turn an 'SExpression'
into a genuine domain. 

In line 8, 'pretend' has to be used. The alternatives::
  
   (coerce([EMAIL PROTECTED])@OutputForm)$SExpression
   coerce([EMAIL PROTECTED])$Expression

both did compile. However, the following run-time command crashed Axiom::

   a:="Integer"::MyDomain

In line 9, 'x::Symbol' is used because this is how say 'Integer' appears in 
'SEX'::

\begin{axiom}
c:=dom(sample()$Integer)
symbol?(car c) 
\end{axiom}

Warning: 'x::Symbol' only works for a domain defined with one identifier.
It would not make sense for domain constructors with parameters like 
'"SQMATRIX(5, FRAC INT)"'.
The 'SExpression' is a list of the form '(f x1 x2 ...)' where 'f' is
the constructor and 'x1', 'x2' are parameters for 'f', recursively repressented 
using the same
format; constants are either elements of 'Integer', such as 5, or of 'Float' or 
of 'String'.

\begin{axiom}
d:=dom(sample()$SQMATRIX(5,FRAC INT))
\end{axiom}

This brings up two problems: We will need some type of parser to take a string 
like 
'"SQMATRIX(5,INT)' into its 'SExpression' and we need to expand abbreviations. 
Replacing line 9 by::

   coerce(x:String):% == (convert(x)$SExpression) pretend %

did not work, even when the string 'x' looked like the 'SExpression' (as nested 
list of lisp symbols). This was  because 'x' remained
a string, which is an 'atom' in 'SExpression'.

The domain constructor 'ListAsDomain' generalizes the hypothetical 
'TagsAsDomain' discussed earlier. As noted earlier, lines 8-12 are needed 
because coerce to 'OutputForm' does not work for 'SExpression'. 

\begin{axiom}
)abbrev domain LDOMAIN ListAsDomain
++ Author: William Sit
++ Date Created: July 15, 2007
++ Description: This domain is needed to enable mappings from finite lists
++   to other finite lists. Maps in Axiom need source and target domains.
++   Axiom currently allows constructions of finite lists, but only as
++   objects in a domain, not as a domain itself. This domain lifts such lists
++   into a domain.

ListAsDomain(S:SetCategory, T: List S): Spec == Imp where                   -- 1
   Spec == Join(SetCategory, CoercibleTo(S), CoercibleTo(OutputForm)) with  -- 2
      coerce: S-> %                                                         -- 3
   Imp ==  add                                                              -- 4
      Rep:= S                                                               -- 5
      x = y == ([EMAIL PROTECTED]) = ([EMAIL PROTECTED])                        
                    -- 6              
      coerce(x:%):S == [EMAIL PROTECTED]                                        
        -- 7
      coerce(x:%):OutputForm ==                                             -- 
8                                   
        if S is SExpression then                                            -- 9
           [EMAIL PROTECTED] pretend OutputForm                                 
        --10
        else                                                                --11
           (coerce([EMAIL PROTECTED])@OutputForm)$S                             
        --12
      coerce(x:S) ==                                                        --13
        not member?(x, T) => error "not an element of domain"               --14
        x pretend %                                                         --15
\end{axiom}


Here is a sample illustration. 
\begin{axiom}
)clear all
a:="Integer"::MyDomain
b:="String"::MyDomain
p:=ListAsDomain(MyDomain, [a,b])
q:=ListAsDomain(Integer, [1,2,3])
1::q
g(i:q):q == (if ((i::Integer)) = 2 then (3::q) else (1::q))
[g(1), g(2), g(3)]
g(4)
\end{axiom}

Note that in the line defining 'q', 'a' cannot be used instead of 'Integer' 
because 'a' is really not a domain.
Unfortunately, while 'q' works, 'p' does not, since 'a::p' will crash Axiom! 
Don't know why yet.

Warning: Here the map 'g' is not a domain constructor as specified for 
'IndexedUnion'. This could be the source of the crash here and the one 
mentioned later when 'SetAsDomain' is used.



\begin{axiom}
)abbrev domain SDOMAIN SetAsDomain
++ Author: William Sit
++ Date Created: July 15, 2007
++ Description: This domain is needed to enable mappings from finite sets
++   to other finite sets. Maps in Axiom needs source and target domains.
++   Axiom currently allows constructions of finite sets, but only as
++   objects in a domain, not as a domain itself. This domain lifts such sets
++   into a domain.
SetAsDomain(S:SetCategory, T: Set S): Spec == Imp where                     -- 1
   Spec == Join(SetCategory, CoercibleTo(S), CoercibleTo(OutputForm)) with  -- 2
      coerce: S-> %                                                         -- 3
   Imp ==  add                                                              -- 4
      Rep:= S                                                               -- 5
      x = y == ([EMAIL PROTECTED]) = ([EMAIL PROTECTED])                        
                    -- 6              
      coerce(x:%):S == [EMAIL PROTECTED]                                        
        -- 7
      coerce(x:%):OutputForm ==                                             -- 
8                                   
        if S is SExpression then                                            -- 9
           [EMAIL PROTECTED] pretend OutputForm                                 
        --10
        else                                                                --11
           (coerce([EMAIL PROTECTED])@OutputForm)$S                             
        --12
      coerce(x:S) ==                                                        --13
        not member?(x, T) => error "not an element of domain"               --14
        x pretend %                                                         --15
\end{axiom}

There is little difference in terms of the code for 'ListAsDomain' and 
'SetAsDomain'.  However, 'ListAsDomain' is more general than 'SetAsDomain' in 
the sense that the indices need not be distinct. One may wonder why that might 
be useful. Suppose we want to create the set of 'IntegerMod(n)' where 'n' is a 
certain sequence defined by a formula $s_n$ where $s_n$ need not be distinct. 
One can use the natural numbers to index, but then we miss the fact that some 
of objects are equal. Anyway, it is convenient to use 'List' instead of 'Set' 
(Axiom has many examples where 'Set' should have been used but 'List' is used 
instead.) While we are at it, we might as well also allow 'StreamAsDomain' as 
long as things are finite. The finiteness condition is required because we want 
to make sure that set of objects included in the domain is exactly the given 
set, list, or stream. If we trust the user, we do not need to test for 
membership in line 14.

Known Problem:  The above does not work with 'SetAsDomain', not even::

\begin{axiom}
r:=SetAsDomain(Integer, brace{[1,2,3]})  
\end{axiom}

The next line will crash or go into infinite loop::

  1::r -- crashes, goes into infinite loop

Can anyone help find the problem?

\begin{axiom}
)abbrev domain STDOMAIN StreamAsDomain
++ Author: William Sit
++ Date Created: July 15, 2007
++ Description: This domain is needed to enable mappings from finite streams
++   to other finite streams. Maps in Axiom need source and target domains.
++   Axiom currently allows constructions of finite streams, but only as
++   objects in a domain, not as a domain itself. This domain lifts such streams
++   into a domain.

-- requires that Stream S has finiteAggregate in order to use member?
StreamAsDomain(S:SetCategory, T: Stream S): Spec == Imp where
   Spec == Join(SetCategory, CoercibleTo(S), CoercibleTo(OutputForm)) with
      if Stream S has finiteAggregate then coerce: S-> %
   Imp ==  add
      Rep:= S
      x = y == ([EMAIL PROTECTED]) = ([EMAIL PROTECTED])
      coerce(x:%):S == [EMAIL PROTECTED]
      coerce(x:%):OutputForm == 
        if S is SExpression then
           [EMAIL PROTECTED] pretend OutputForm
        else
           (coerce([EMAIL PROTECTED])@OutputForm)$S
      if Stream S has finiteAggregate then
        coerce(x:S) == 
          not member?(x, T) => error "not an element of domain"
          x::%
\end{axiom}


--
forwarded from http://wiki.axiom-developer.org/[EMAIL PROTECTED]

[IndexedUnion] (new) New topic: IndexedUnion

Reply via email to