Re: [S09] "Whatever" indices and shaped arrays

Jonathan Lang Tue, 27 Feb 2007 15:09:11 -0800

David Green wrote:

On 2/24/07, Jonathan Lang wrote:
>In effect, using * as an array of indices gives us the ordinals
>notation that has been requested on occasion: '*[0]' means 'first
>element', '*[1]' means 'second element', '*[-1]' means 'last
>element',
>'*[0..2]' means 'first three elements', and so on - and this works
>regardless of what the actual indices are.


Using * that way works, but it still is awkward, which makes me think
there's something not quite dropping into place yet.  We have the
notion of "keyed" indexing via [] and "counting"/ordinal indexing via
[*[]], which is rather a mouthful.  So I end up back at one of
Larry's older ideas, which basically is: [] for counting, {} for keys.


What if you want to mix the two?  "I want the third element of row 5".
In my proposal, that would be "@array[5, *[2]]"; in your proposal,
there does not appear to be a way to do it.

Unless the two approaches aren't mutually exclusive: "@array{5,
*[2]}".  That is, allow subscripted Whatevers within curly braces for
to enable the mixing of ordinals and keys.  Since this is an unlikely
situation, the fact that nesting square braces inside curly braces is
a bit uncomfortable isn't a problem: this is a case of making hard
things possible, not making easy things easy.

What about shaped arrays?  A "shape" means the indices *signify*
something (if they didn't, you wouldn't care, you'd just start at
0!).  So they really are *keys*, and thus should use a hash (which
may not use any hash tables at all, but it's still an associative
array because it associates meaningful keys with elements).  I'm not
put off by calling it a hash -- I trust P6 to recognise when I
declare a "hash" that is restricted to consecutive int keys, is
ordered, etc. and to optimise accordingly.


The one gotcha that I see here is with the possibility of
multi-dimensional arrays.  In particular, should multi-dimensional
indices be allowed inside square braces?  My gut instinct is yes;
conceptually, "the third row of the fourth column" is perfectly
reasonable terminology to use.  The thing that would distinguish []
from {} would be a promise to always use zero-based, consecutive
integers as your indices, however many dimensions you specify.  With
that promise, you can always guarantee that the wrap-around semantics
will work inside [], while nobody will expect them to work inside {}.

In short, the distinction being made here isn't "unshaped" vs.
"shaped"; it's "ordinal indices" vs. "named indices", or "ordinals"
vs. "keys".

That said, note that - in the current conception, at least - one of
the defining features of a shaped array is that trying to access
anything outside of the shape will cause an exception.  How would
shapes work with the ordinals-and-keys paradigm?

First: Ordinals have some severe restrictions on how they can be
shaped, as specified above.  The only degrees of freedom you have are
how many dimensions are allowed and, for each dimension, how many
ordinals are permitted.  Well, also the value type (although the key
type is fixed as "Int where 0..*".  So you could say something like:

 my @array[2, 3, *]

...which would mean that the array must be three-dimensional; that the
first dimension is allowed two ordinals, the second is allowed three,
and the third is allowed any number of them - i.e., 'my @array[^2; ^3;
0..*]' in the current syntax.  Or you could say:

 my @array[2, **, 2]

...meaning that you can have any number of dimensions, but the first
and the last would be constrained to two ordinals each: 'my @array[^2;
**; ^2]'.

Note the use of commas above.  Since each dimension can only take a
single value (a non-negative integer), there's no reason to use a
multidimensional list to define the shape.  Personally, I like this
approach: it strikes me as being refreshingly uncluttered.

Furthermore, you could do away with the notion of "shaped vs.
unshaped": just give everything a default shape.  The default shape
for arrays would be '[*]' - that is, one dimension with an
indeterminate number of ordinals.

Meanwhile, shapes for {} would continue to use the current syntax.
'[$x, $y, $z]' would be nearly equivalent to '{0..^$x; 0..^$y;
0..^$z}'.

If there are no meaningful lookup keys, if all I can do to get
through my list is count the items, then an array is called for, and
it can work in the usual way: start at 0, end at -1.  It is useful to
be able to count past the ends of an array, and * can do this by
going beyond the end: *+1, *+2, etc., or before the beginning: *-1,
*-2, etc.  (This neatly preserves the notion of * as "all the
elements" -- *-1 is the position before everything, and *+1 is the
position after everything else.)


Regardless, I would prefer this notion to the "offset from the
endpoint" notion currently in use.  Note, however, that [*-1] wouldn't
work in the ordinals paradigm; there simply is nothing before the
first element.  About the only use I could see for it would be to
provide an assignment equivalent of "unshift": '@array[*-1] = $x'
could be equivalent to 'unshift @array, $x'.  But note that, unlike
the 'push'-type assignments, this would change what existing ordinals
point to.

Meanwhile, {*-1} would only make sense in cases where keys are ordered
and new keys can be auto-generated.  Note also that {*+$x} is
compatible with {*[$x]}: the former would reference outside of the
known set of keys, while {*[$x]} would reference within them.

--
Jonathan "Dataweaver" Lang

Re: [S09] "Whatever" indices and shaped arrays

Reply via email to