On Fri, Mar 16, 2012 at 4:26 PM, Bryan Van de Ven <bry...@continuum.io>
wrote:
> Hi all,
>
> I have spent some time thinking about things, and discussing them with
folks
> nearby. I actually got to wondering whether we really need new dtypes for
> this. It seems like enumerated values or factor levels could be cast as an
> annotation or metadata that could be attached to any existing integral
> dtypes. It spells differently enough that I have put up an alternate
version
> that reflects this notion. I'd like to see what folks think of this
> direction:
>
>    https://github.com/bryevdv/numpy/blob/enum/doc/neps/enum_alt.rst
>
> So this would require adding machinery to existing dtypes to behave
properly
> when there is factor metadata present. Perhaps that is not an acceptable
> trade-off, but it seems worth discussing.

I took a look at this, but I think something was lost in the translation
from your head to text :-). Your description here makes it sound like
what's different about this proposal is that there's very different
underlying mechanics, but the enum_alt file just seems to describe an
alternative and more-or-less equivalent user-level API. Unless you told me,
I would have assumed that it just created a new dtype, rather than modified
existing ones.

What mechanism are you thinking of? Or did I miss something?

> I think a very similar approach could be used to add categorical ranges to
> any numerical or string types (I think they are called "shingles" in R?)

A 'shingle' is a way of mapping (floating point) numbers into categories.
However, they generally allow a single number to fall into multiple
categories. So for example, you might take these data points:

 1  2  3  4  5  6  7  8  9  10 11

And divide them into categories A, B, C like this:

 1  2  3  4  5  6  7  8  9  10  11
 AAAAAAAAAAAAA
          BBBBBBBBBBBBB
                   CCCCCCCCCCCCCCC

Which is why they're called "shingles" :-)

http://www.floridadisaster.org/hrg/images/roofs/shingle_loose_tab_large.jpg
This can be a very convenient data structure for various sorts of
visualizations, but I'm not sure how it would make sense to integrate it
into basic numerical types.

R has a more basic function called 'cut' which takes a numerical array plus
some specified breakpoints, and returns a factor array. But that's a simple
utility function that doesn't need any special features in the underlying
representation.

-- Nathaniel
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to