17.01.2014 15:09, Aldcroft, Thomas kirjoitti: [clip] > I've been playing around with porting a stack of analysis libraries > to Python 3 and this is a very timely thread and comment. What I > discovered right away is that all the string data coming from > binary HDF5 files show up (as expected) as 'S' type,, but that > trying to make everything actually work in Python 3 without > converting to 'U' is a big mess of whack-a-mole. > > Yes, it's possible to change my libraries to use bytestring > literals everywhere, but the Python 3 user experience becomes > horrible because to interact with the data all downstream > applications need to use bytestring literals everywhere. E.g. > doing a simple filter like `string_array == 'foo'` doesn't work, > and this will break all existing code when trying to run in Python > 3. And every time you try to print something it has this horrible > "b" in front. Ugly, and it just won't work well in the end. [clip]
Ok, I see your point. Having additional Unicode data types with smaller widths could be useful. On Python 2, they would then be Unicode strings, right? Thanks to Py2 automatic Unicode encoding/decoding, they might also be usable in interactive etc. use on Py2. Adding new data types in Numpy codebase takes some work, but it's possible to do. There's also an issue (as noted in the Github ticket) that array([u'foo'], dtype=bytes) encodes silently via the ASCII codec. This is probably not how it should be. -- Pauli Virtanen _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion