On 07/21/2010 11:56 AM, John Salvatier wrote:
I don't really know much about this topic, but what about a flag at
array creation time (or whenever you define labels) that says whether
valid indexes will be treated as labels or indexes for that array?
On Wed, Jul 21, 2010 at 9:37 AM, Keith Goodman <kwgood...@gmail.com
<mailto:kwgood...@gmail.com>> wrote:
About a dozen people attended what was billed as a continuation of the
SciPy 2010 datarray BoF. We met at UC Berkeley on July 19 as part of
the py4science series.
A datarray is a subclass of a Numpy array that adds the ability to
label the axes and to label the elements along each axis.
We spent most of the time discussing how to index with tick labels.
The main issue is with integers: is an integer index a tick name or a
position index?
At the top level, datarrays always use regular Numpy indexing: an int
is a position, never a label. So darr[0] always returns the first
element of the datarray.
The ambiguity occurs in specialized indexing methods that allow
indexing by tick label name (because the name could be an int). To
break the ambiguity, the proposal was to provide several tick indexing
methods[1]:
1. Integers are always labels
2. Integers are never treated as labels
3. Try 1, then 2
We also discussed allowing axis labels to be any hashable object
(currently only strings are allowed). The main problem: integers.
Currently if an axis is labeled, say, "time", you can do
darr.sum(axis="time"). What happens when an axis is labeled with an
int? What does the 2 in darr.sum(axis=2) refer to? A position or a
label? The same problem exists for floats since a float is (currently)
a valid axis for Numpy arrays.
References:
[1]
http://github.com/fperez/datarray/commit/3c5151baa233675b355058eb3ba028d2629bece5
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
The current implemented option of allowing strings is the only practical
option and I think that most other related languages also impose this
constraint. Otherwise we will effectively break compatibility with
Python and numpy because darr[0] can result in different answers
depending on the type of object involved - especially if you are using
views and forget the actual object type.
I do think that we do have to avoid adding complexity that increases
runtime like looking for the label 2 when it should be the second axis.
Also we have to avoid situations that lead to input errors like flag
values or extra arguments.
Bruce
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion