On Sun, Feb 22, 2015 at 7:29 PM, Sturla Molden <sturla.mol...@gmail.com> wrote: > > On 22/02/15 19:21, Aldcroft, Thomas wrote: > > > Problems like this are now showing up in the wild [3]. Workarounds are > > also showing up, like a way to easily convert from 'S' to 'U' within > > astropy Tables [4], but this is really not a desirable way to go. > > Gigabyte-sized string data arrays are not uncommon, so converting to > > UCS-4 is a real memory and performance hit. > > Why UCS-4? The Python's internal "flexible string respresentation" will > use ascii for ascii text.
numpy's 'U' dtype is UCS-4, and this is what Thomas is referring to, not Python's string type. It cannot have a flexible representation as it *is* the representation. Python 3's `str` type is opaque, so it can freely choose how to represent the data in memory. numpy dtypes transparently describe how the data is represented in memory. -- Robert Kern
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion