On Di, 2016-09-13 at 15:02 +0200, Lluís Vilanova wrote: > Hi! I'm giving a shot to issue #3184 [1], based on the observation > that the > string dtype ('S') under python 3 uses byte arrays instead of unicode > (the only > readable string type in python 3). > > This brings two major problems: > > * numpy code has to go through loops to open and read files as binary > data to > load text into a bytes array, and does not play well with users > providing > string (unicode) arguments > > * the repr of these arrays shows strings as b'text' instead of > 'text', which > breaks doctests of software built on numpy > > What I'm trying to do is make dtypes 'S' and 'U' equivalnt > (NPY_STRING and > NPY_UNICODE). > > Now the question. Keeping 'S' and 'U' as separate dtypes (but same > internal > implementation) will provide the best backwards compatibility, but is > more > cumbersome to implement.
I am not sure how that can be possible. Those types are fundamentally different in how they store their data. String types use one byte per character, unicode types will use 4 bytes per character. You can maybe default to unicode in more cases in python 3, but you cannot make them identical internally. What about giving `np.loadtxt` an encoding kwarg or something along that line? - Sebastian > > Is it acceptable to internally just translate all appearances of 'S' > (NPY_STRING) to 'U' (NPY_UNICODE) and get rid of one of the two when > running in > python 3? > > The main drawback I see is that dtype reprs would not always be as > expected: > > # python 2 > >>> np.array('foo', dtype='S') > array('foo', > dtype='|S3') > > # python 3 > >>> np.array('foo', dtype='S') > array('foo', > dtype='<U3') > > > [1] https://github.com/numpy/numpy/issues/3184 > > > Cheers, > Lluis > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion >
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion