Re: [Felix-language] [felix] generalised arrays

john skaller Fri, 29 Jun 2012 16:01:19 -0700

On 30/06/2012, at 7:49 AM, Shayne Fletcher wrote:

> On 29/06/2012 06:28, john skaller wrote:
>> If an array has length N + M it is the joining of two arrays, length
>> N and length M. So naturally the index has type N + M.
>> This type has two cases:
>> 
>>      (case 0 of N + M) n  // where n in N
>>      (case 1 of N + M) m // where m in M
>> 
>> in other words, you first choose which array to access,
>> then pick and index for it of the appropriate type.
> 
> Took me a while to understand this bit but I get it now.



Took me a week! But I think I have a result on the way.
[BTW: I still get confused with row-major vs column majo
indexing of matrices so some of the discussion below is
going to mess up the order .. the actual algorithms will
produce results which are consistent, one way or the
other .. even if my brain isn't]

Now, this turns out to be even more pervasive that I thought.
The formula is TRIVIAL! Of course!!

For an "index type sum" which is an arbitrary combination
of tuples (think multi-dimensional arrays), sums (think
concatenating arrays), and unitsums (think ordinary array
index), the formula is simple enough:

        second thing * size of second thing + first thing
        y = m * x + b

So dang obvious. For example for a 2 D array:

        i * dim j + j

for a sum of T its:

        rep of  case arg * #cases + case index


The decoding is trivial:

        top index = value % ncases
        top arg = value / ncases

If you want to go down the tree you just apply the above recursively.

The impact is phenomonal. First, note that a single integer is always
enough to span any array (you really don't have that much address
space to do bigger arrays!)

Now, using indices of types 6 or 23 or whatever .. we note that
the encoding is compact. This means if you have a total of
n possible indices, the representation is exactly 0 .. (n-1).

In other words you can have, say, a 20 dimensional array and
the 20 element tuple index is guaranteed to fit in a single 
integer. 

It also means a fold or iteration over all the values of the matrix
remains a trivial loop over 0 .. (n-1). The same implementation
of folds, maps, etc etc works for matrices of ALL ranks.

And note again, it is just matrices. We can also handle
sums etc so the operations also handle 

        (1,2,(3,(4,5),6,7))

provided you coerce it to a linear array with a complicated
index, instead of a complicated tuple. That thing above ..
that's an **array** in that form .. a LINEAR array.

So we not only have standard array polyadicity here,
with folds and maps over arbitrary rank .. we have generalised
the notion of rank to an arbitrary combination of tuples and
cases over finite cyclic groups. And the indexing in either
the multi-stage complicated tuple form or the linearised
array with nasty indices is 100% type safe and requires
no array bounds checking.

So we have an unusual?? situation where the maths not only
gives us a safety guarantee, and polyadic behaviour .. it
also guarantees optimally compact representation.

There is a small performance cost. A tuple index

        k = (i,j)  where i in 3 and j in 7

will have to be decoded by

        i = k % 3
        j = (k / 3) % 7    // the % 7 is redundant

i.e. we require integer multiplication, division, and modulus to split
out the components. This *should* be faster that

        int * int

because that requires two machine words (cache hit x 2).
Of course if you use powers of 2 the C compiler should reduce
the operations to  bitshifts.

you need to observe carefully that not every value "uses all the bits".
Its pretty obvious:

        1 + 3

Here, the first case is 1, so there's no argument and we use exactly 1 bit
which is 0. For the second case the low bit is 1, so the remaining
three cases are decided by

        (k / 2)   // % 3 , ranges over 0 to 2

Whilst it may be faster to "round up" to a power of two, it is VITAL that 
we DO NOT DO THIS RUBBISH IN THE ALGORITHM. If the user wants
it they should make sums with powers of two cases.

The reason is .. the compactness of the representation is VITAL for
iteration 0 .. (n-1). We do not want to have bit patterns that don't
represent anything.

As mentioned there's more. Consider the tuple:

        (1,2)

That requires two machine words to represent, and uses a TRICK
to decode it: address calculations. This is how conventional languages
work. They use a trick: arithmetic is secretly modular, typically mod 2^32.
Now consider a tuple of type

        3 * 7

Note: that's a TYPE, it means values mod 3 for the first index and mod 7
for the second. That tuple can be represented as a single machine
word with the standard formula

        k = i + 3 * j   // one machine word

with decoders -- aka projection functions:

        i = k % 3
        j = k / 3   ( % 7 but that's not required here )

An int might not be enough for an arbitrary tuple, but as mentioned it
is always enough for an array index.

I have to mention this because we may have a problem if generalised array 
indicies
are represented compactly like this but ordinary tuples of small finite cyclic 
groups
are not. So there's more work to make sure the representations line up, and we 
use
the same representation in the same instances.

As a result of all this, I think that

        case 2 of 3 ++ == case 0 of 3

i.e. we'll use modular arithmetic, and indeed the special notation for
unitsum values may be introduced:

        1 mod 3 // instead of case 1 of 3

since that's more familiar. 

Note the mod must be a constant. This is a constraint of conventional type
systems. To make the modulus variable (arbitrary expression) we need
dependent typing.


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Re: [Felix-language] [felix] generalised arrays

Reply via email to