On Sun, Feb 22, 2015 at 2:46 PM, Charles R Harris <[email protected]> wrote: > > On Sun, Feb 22, 2015 at 3:40 PM, Aldcroft, Thomas > <[email protected]> wrote: >> >> >> >> On Sun, Feb 22, 2015 at 2:52 PM, Nathaniel Smith <[email protected]> wrote: >>> >>> On Sun, Feb 22, 2015 at 10:21 AM, Aldcroft, Thomas >>> <[email protected]> wrote: >>> > The idea of a one-byte string dtype has been extensively discussed >>> > twice >>> > before, with a lot of good input and ideas, but no action [1, 2]. >>> > >>> > tl;dr: Perfect is the enemy of good. Can numpy just add a one-byte >>> > string >>> > dtype named 's' that uses latin-1 encoding as a bridge to enable Python >>> > 3 >>> > usage in the near term? >>> >>> I think this is a good idea. I think overall it would be good for >>> numpy to switch to using variable-length strings in most cases (cf. >>> pandas), which is a different kind of change, but fixed-length 8-bit >>> encoded text is obviously a common on-disk format in scientific >>> applications, so numpy will still need some way to deal with it >>> conveniently. In the long run we'd like to have more flexibility (e.g. >>> allowing choice of character encoding), but since this proposal is a >>> subset of that functionality, then it won't interfere with later >>> improvements. I can see an argument for utf8 over latin1, but it >>> really doesn't matter that much so whatever, blue and purple bikesheds >>> are both fine. >>> >>> The tricky bit here is "just" :-). Do you want to implement this? Do >>> you know someone who does? It's possible but will be somewhat >>> annoying, since to do it directly without refactoring how dtypes work >>> first then you'll have to add lots of copy-paste code to all the >>> different ufuncs. >> >> >> I'm would be happy to have a go at this, with the caveat that someone who >> understands numpy would need to get me started with a minimal prototype. >> From there I can do the "annoying" copy-paste for ufuncs etc, writing tests >> and docs. I'm assuming that with a prototype then the rest can be done >> without any deep understanding of numpy internals (which I do not have). >> >> - Tom >> > > > The last two new types added to numpy were float16 and datetime64. Might be > worth looking at the steps needed to implement those. There was also a user > type, `rational` that got added, that could also provide a template. Maybe > we need to have a way to add 'numpy certified' user data types. It might > also be possible to reuse the `c` data type, currently implemented as `S1` > IIRC, but that could cause some problems.
float16 and rational probably aren't too relevant because they are fixed-size types, and variable-size dtypes are much trickier. datetime64 will be more similar, but also add its own irrelevant complexities -- you might be best off just looking at how S and U work and copying them. -n -- Nathaniel J. Smith -- http://vorpus.org
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
