[Felix-language] polyadic arrays and isomorphisms

john skaller Sat, 05 Jan 2013 18:09:42 -0800

I'm still stuck on the order of indices :)

Given an array of arrays:


        T ^ I ^ J

the indexing order is fixed by the data structure:

        a . j . i

Note this is the reverse of the index order in the type. This is not arbitrary:
you have to select the subarray of the top level array first, then select
the element within it second. This means in j index moves fastest,
that is ++j moves sizeof T bytes. Note that using math notation:

        i (j a)

reverses the ordering. However postfix notation is easier for me to grok
for array subscripts.

Originally I just said that you can coerce the array to

        T ^ (I * J)

so with tuple multi-index the indexing is "invariant" in postfix notation:

        a . (i,j)

[however I screwed this up in one place in the compiler]

Note the j index still moves fastest. This is the natural lexicographical
sort ordering: think of strings: AA AB AC ...BA BB BC ... the LHS index
is the slowest moving in a linear order. however in math notation we get:

        (i,j) a

that is, the ordering in the tuple cannot just swap over: its the same for
both notations.

With this convention, we get the compaction formula:

        index = i  * len J + j

because i is jumping over subarrays of length J.

With this notation, a new index for the inner array

        T --> U ^ N
        U ^ N ^ I ^ J

goes on the end:

        u . j . i . n

whereas  if you put the whole thing as an element of a new array:

        T ^ I ^ J ^ K
        u . k . j . i


HOWEVER, whilst (n,i,j,k) is natural lexicographical ordering,
it is also "big endian" like a number system. The biggest digit goes first.
Little endian is nicer in many ways.

All of this is derived from the arbitrary decision:

        T ^ I ^ J -> T ^ (I * J)

i.e. the order of "indices" is preserved in the type transformation
instead of being reversed. It's not clear how sane this is given that
with T ^ J you have to read right to left:

        array length J of array length I of T

If we used -> as an exponential instead the whole
argument breaks down:

        a:J -> I -> T

Note -> is right associative. Both ^ and -> are exponentials
with reversed order of operands. Now the natural order of
indexing is invariant:

        a j i  // is a T

Note here the array is like a function. It is applied to the
first index, which has to be j, to select the inner array
which gives another function. That's applied to the second
index to give the final element. Now the "curried" form of
this function would normally be:

        a': J * I -> T
        a (j,i)

which is the opposite of what the previous analysis dictates.

Of couse it is a simple problem: do we preserve the order
of type indicies in the ^ notation or the -> notation?
We cannot have both, because reversing the operand
order:

        D -> C == C ^ D

where D suggests domain and C codomain,  this naturally
reverses the indexing order too.

Changing the definition of currying seems out of the question!
Quite simply the -> matches up the argument order with the
type whereas ^ swaps it.

So the question is: did I get the rule right, or should it be instead:

        T ^ I ^ J => T ^ (J * I)

and how does this impact the other natural laws of algebra:

        A * (B + C) => A * B + A * C
        A ^ (B + C) => A ^ B * A ^ C

etc?

All of these have meaning. however with types operations like + and *
are only commutative and associative up to isomorphism: the types:

        int * double * long
        int * (long * double)

are isomorphic but not equal. Here the isomorphism implies
that you can define a converter from generic types which is
fully parametric:

        convert: T * U * V -> T * ( V * U)

meaning that the function doesn't care what the types are.

However some isomorphism are more "natural" than others.
For example associativity is more natural than commutativity
because if you look at the representations in C:

        struct { int; double; long; }
        struct { int; struct { double; long; } }

for example, the elements are in the same order and the structures
are "layout compatible". This means we can use a reinterpret
cast on a reference or pointer to these types "safely" in the sense
that we don't need to actually move anything around or generate
any code (such casts do break strict aliasing rules of ISO C).

Exactly which isomorphisms are natural is determined by the
representation. In particular an isomorphism in the abstract
category of types TYPE may not be an isomorphism in
the types of the representation category (in this case C types)
in the sense of identity or in the sense of layout compatibility.

Indeed the representation encoding is a morphism from
Felix types to C types, and that is naturally regarded as a functor.
At a lower level its a natural transformation. Which means it preserves
isomorphisms.

The critical point here is that there are at least three kinds of isomorphism:

        (1) most general: requires moving stuff about
        (2) layout compatible: requires a cast only
        (3) identical. Nothing to do, they're the same

and which ones the language designer chooses DICTATES the 
properties of the language.

Currently Felix choses t his one at the abstract level:

        int * int * int = int ^ 3

as an equality. These two types are identical. At the C level
they're not!  Tuples are C structs likeL

        struct { T0; T1; T2; }

whereas an array is

        struct { T data[N]; }

These are layout compatible. To "fix" this the Felix compiler translates
all tuples of the same type into arrays. However it cannot do this fully
until AFTER monomorphisation. So you cannot apply array operations
to a generic tuple EVEN IF the types end up being the same.
This avoids "dependent name lookup" which is precisely the premature
application of operations to a type based on the fact that "it might"
end up converting (eg in C++ you can add U and V types. No problem!
Provided they end up as numbers! in Felix this is banned: you have
to use a type class constraint to allow it]

But there's a problem: you cannot prevent say using an expression to index
a tuple

        (1,2,3,4) . i

because the type of the LHS is int ^ 4. The isomorphism is applied
automagically, which means arrays must be a strict "subtype"
of tuples: any operation on tuples applies to arrays too.
But the above is a typo: I meant to write:

        (1,2,3,4) . 1

:-) It may be better to make the user apply the isomorphism:

        [< 1,2,3,4 >]

or perhaps:

        array_of_tuple (1,2,3,4)

Which would also simplify the compiler. In particular instead of magically
applying casts in the generated code, the user would have had to write
them.

Why does all this matter? The tuple/array thing is quite nice.
It works reasonably well. But now we're looking at a much larger
set of isomorphisms! The choices of which ones are identities
and which ones are casts etc are now critical.

Here's an example. A compact linear type is one like:

        3 + 3 *  (6 * 7)


The representation is a single integer. This means for a polymorphic
type we cannot calculate the representation. This is already true for
arrays vs tuples, but in that case the two are always layout
compatible.

Compact linear types are integers and are clearly NOT layout
compatible with tuples or sums. Introducing the representation
as magical (as I am doing) makes the compiler very complex.
Pointers requires three words instead of one (you have to
divide and modulo the integer to extract a component,
the same as shift and mask for bitfields).




--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_123012
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

[Felix-language] polyadic arrays and isomorphisms

Reply via email to