[Felix-language] compact linear types explained

john skaller Thu, 26 Sep 2013 07:27:46 -0700

I am writing this to explain compact linear types to myself
so I can get the code right.


First, the word compact just means there are "no holes",
and linear means totally ordered.

A subrange of int 0..n-1 is a compact linear type denoted n,
it has n values.

Consider a pair:

        3 * 5

That's an integer in the set {0,1,2} followed by one in the set {0,1,2,3,4}.
If we represent this in two successive bytes the 16 bit word resulting
is not compact. But here is a compact representation:

        x * 5 + y // where x:3, y:5

It's clear there are 3*5=15 values, represented by integers in the
range 0..14, which is a compact linear type.

More generally this is just a standard positional number system
with arbitrary radix in each position. For the type

        3 * 5 * 7

the encoding is just

        x * 5 * 7 + y * 7 + z

If the radices are all the same, for example:

        2 * 2 * 2

then we just get the usual formula:

        x * 2 * 2 + y * 2 + z
        = x * 2 ^2 + y * 2^1 + z * 2 ^0

Note the exponent is equal to the digit position from right to left,
but tuples are numbered left to right.

So that's how products are encoded. What about sums?

Consider the sum type

        3 + 5

What does this mean? Well it means, in the first case we have a value
in {0,1,2} OR we have a value in the second case in {0,1,2,3,4}.
The obvious encoding scheme is:

        (x + 5) + y

 So basically the values 0,1,2 of the first case map to the values
5,6,7 and the values of the second case stay as they are.
More generally, we just add a value to the sum of the sizes
of all the cases after it. Unsurprisingly we have 3 + 5 = 8 values.

Now, to decode a sum K, we add the size of the right most N
terms until we get a value greater than K, then it is the N-1'th
case from the right. For example

        5 < 6 < 3 + 5, so N=2

which means we have 1 case from the right, which is case number 0
from the left.

To get the i'th term out of a product type, we first divide by the
product of all the sizes on the right, then take the remainder
of the size of the i'th term from the right. So for value 14 of
type 3 * 5, to get the first component we first divide by 5
which yields 2, then modulo by 3 also leaving 2. To get the
second term, we divide by 1, and modulo by 5, leaving 4,
and hence

        2 * 5 + 4 = 14

Note that for both products and sums I have chosen a
BIG ENDIAN encoding which means the tuple or sum
component number numbered from the right instead of the
left. This is basically because everyone is sure to believe
that the binary number

        10

represents 2: we read numbers from high to low, left
to right, so the low order digit is last. So the tuple

        (1,0) : 2 * 2

is encoded as value 2. However the index numbering
in code is left to right.  The encoding is counter intuitive
in one sense but it ensures if we sequence through the
values 0,1,2,3 we get the tuples (0,0), (0,1), (1,0), (1,1).

Well, that's it. Its that simple really, except we have to 
throw in recursion. If we note that the type

        3

is actually a sum

        3 = 1 + 1 + 1

representing 3 choices, where 1 is the type of the empty
tuple

        ()

which has 1 value, then a compact linear type is any
combination of sums and products of nothing:

        1 is a compact linear type
        any sum of compact linear types is compact linear
        any product of compact linear types is compact linear

That's it (actually we could throw in 0 too).

Why are compact linear types useful?

Obviously, they provide a guaranteed minimal representation.
However the linearity is useful too: by ranging from 0 through
to the size minus 1 of the type, you exhaust all possible
values.

In particular if we have, say, and array of length 3 of
arrays of length 5:

        (int ^ 5) ^ 3

there's an associated matrix:

        int ^ ( 3 * 5)

with indices which are values of the tuple type 3 * 5.
With our compact linear representation

        i = x * 5 + y

we can range i from 0 to 14, and cover all the elements of
the array of arrays in the order they're stored.

Note that the order reverses going from

        (int ^ 5) ^ 3 --> int ^ (3 * 5)

This seems weird but note:

        a . x . y --> m . (x,y)

results. The x index is applied first in both cases,
then the y term. y moves fastest, so adjacent y values
mean adjacent store. One has to note that in the expression:

        (int ^ 5) ^ 3


you proceed outside in, so you have to select a x:3 first,
then a y:5.

When you convert your array of arrays of int with two linear indices
to an array-array (matrix) of int with a single pair index AND you then
encode the pair as a compact linear type, you have got back to
a single linear array with a single linear index.

And that is how polyadic array addressing works.
Do both conversions, and you can scan all the elements
independently of how many dimensions it originally had.

I will note finally the strange fact that with compact linear indexing
this is actually an array:

        ( (0,1,2), (3,4,5,6,7) )

The type seems to be

        (int ^ 3) * (int * 5)

which is surely just a tuple not an array. But with magic is this
not also:

        int ^ ( 3 + 5)

which means a single index with a sum type. And recall the compact linear
encoding is 0 .. 7.

When you think what the indices mean, it means:

        first chose either the left or right subarray
        If you chose the left one, pick one of the 3 elements
        else you chose the right one so pick one of the 5 elements

and our theory just says that's the same as picking one of 8 elements.

Ok, so there is one more "fun" thing to note at the moment.
We had a complicated array thing like:

        int ^ (3 *5 + 2)

where the index type is compact linear. But remembering

        bool = 2

what about

        2 ^ (3 * 5 + 2)


This is an array with a compact linear index, but the base
type is also compact linear. And an array is a tuple!

So in this case *the array itself is compact linear*.

CAVEAT: in practice we have to limit the size of a
compact linear type to one that fits into some integer
type because in general we have to decode it with
integer operations like division and remainder.

The NATURAL use of such types is then array indices,
because arrays are always limited by available memory,
and thus in practice the index will always fit in a machine
word. Even for a bit array, you only need to ensure on a 64
bit machine you only have arrays of around 8'th size of
memory.


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

[Felix-language] compact linear types explained

Reply via email to