On Di, 2016-09-13 at 15:02 +0200, Lluís Vilanova wrote:
> Hi! I'm giving a shot to issue #3184 [1], based on the observation
> that the
> string dtype ('S') under python 3 uses byte arrays instead of unicode
> (the only
> readable string type in python 3).
> 
> This brings two major problems:
> 
> * numpy code has to go through loops to open and read files as binary
> data to
>   load text into a bytes array, and does not play well with users
> providing
>   string (unicode) arguments
> 
> * the repr of these arrays shows strings as b'text' instead of
> 'text', which
>   breaks doctests of software built on numpy
> 
> What I'm trying to do is make dtypes 'S' and 'U' equivalnt
> (NPY_STRING and
> NPY_UNICODE).
> 
> Now the question. Keeping 'S' and 'U' as separate dtypes (but same
> internal
> implementation) will provide the best backwards compatibility, but is
> more
> cumbersome to implement.

I am not sure how that can be possible. Those types are fundamentally
different in how they store their data. String types use one byte per
character, unicode types will use 4 bytes per character. You can maybe
default to unicode in more cases in python 3, but you cannot make them
identical internally.

What about giving `np.loadtxt` an encoding kwarg or something along
that line?

- Sebastian


> 
> Is it acceptable to internally just translate all appearances of 'S'
> (NPY_STRING) to 'U' (NPY_UNICODE) and get rid of one of the two when
> running in
> python 3?
> 
> The main drawback I see is that dtype reprs would not always be as
> expected:
> 
>    # python 2
>    >>> np.array('foo', dtype='S')
>    array('foo',
>          dtype='|S3')
> 
>    # python 3
>    >>> np.array('foo', dtype='S')
>    array('foo',
>          dtype='<U3')
> 
> 
> [1] https://github.com/numpy/numpy/issues/3184
> 
> 
> Cheers,
>   Lluis
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to