On Sun, Feb 22, 2015 at 5:46 PM, Charles R Harris <[email protected]
> wrote:

>
>
> On Sun, Feb 22, 2015 at 3:40 PM, Aldcroft, Thomas <
> [email protected]> wrote:
>
>>
>>
>> On Sun, Feb 22, 2015 at 2:52 PM, Nathaniel Smith <[email protected]> wrote:
>>
>>> On Sun, Feb 22, 2015 at 10:21 AM, Aldcroft, Thomas
>>> <[email protected]> wrote:
>>> > The idea of a one-byte string dtype has been extensively discussed
>>> twice
>>> > before, with a lot of good input and ideas, but no action [1, 2].
>>> >
>>> > tl;dr: Perfect is the enemy of good.  Can numpy just add a one-byte
>>> string
>>> > dtype named 's' that uses latin-1 encoding as a bridge to enable
>>> Python 3
>>> > usage in the near term?
>>>
>>> I think this is a good idea. I think overall it would be good for
>>> numpy to switch to using variable-length strings in most cases (cf.
>>> pandas), which is a different kind of change, but fixed-length 8-bit
>>> encoded text is obviously a common on-disk format in scientific
>>> applications, so numpy will still need some way to deal with it
>>> conveniently. In the long run we'd like to have more flexibility (e.g.
>>> allowing choice of character encoding), but since this proposal is a
>>> subset of that functionality, then it won't interfere with later
>>> improvements. I can see an argument for utf8 over latin1, but it
>>> really doesn't matter that much so whatever, blue and purple bikesheds
>>> are both fine.
>>>
>>> The tricky bit here is "just" :-). Do you want to implement this? Do
>>> you know someone who does? It's possible but will be somewhat
>>> annoying, since to do it directly without refactoring how dtypes work
>>> first then you'll have to add lots of copy-paste code to all the
>>> different ufuncs.
>>>
>>
>> I'm would be happy to have a go at this, with the caveat that someone who
>> understands numpy would need to get me started with a minimal prototype.
>> From there I can do the "annoying" copy-paste for ufuncs etc, writing tests
>> and docs.  I'm assuming that with a prototype then the rest can be done
>> without any deep understanding of numpy internals (which I do not have).
>>
>> - Tom
>>
>>
>
> The last two new types added to numpy were float16 and datetime64. Might
> be worth looking at the steps needed to implement those. There was also a
> user type, `rational` that got added, that could also provide a template.
> Maybe we need to have a way to add 'numpy certified' user data types. It
> might also be possible to reuse the `c` data type, currently implemented as
> `S1` IIRC, but that could cause some problems.
>

OK I'll have a look at those.

Thanks,
Tom


>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to