On Thu, Jan 23, 2014 at 4:51 PM, Chris Barker <chris.bar...@noaa.gov> wrote: > On Thu, Jan 23, 2014 at 12:10 PM, <josef.p...@gmail.com> wrote: >> >> > Exactly -- but what should those conversion/casting rules be? We can't >> > decide that unless we decide if 'S' is for text or for arbitrary bytes >> > -- it >> > can't be both. I say text, that's what it's mostly trying to do already. >> > But >> > if it's bytes, fine, then some things still need cleaning up, and we >> > could >> > really use a one-byte-text type. and if it's text, then we may need a >> > bytes >> > dtype. >> >> (remember I'm just a balcony muppet) > > > me too ;-) > > >> >> As far as I understand all codecs have the same ascii part. > > > nope -- certainly not multi-byte codecs. And one of the key points of utf-8 > is that the ascii part is compatible -- none of teh other full-unicode > encoding are. > > many of the one-byte-per-char ones do share the ascii part, but not all, or > not completely. > >> So I would >> cast on ascii and raise on anything else. > > > still a fine option -- clearly defined and quite useful for scientific text. > However, I would prefer latin-1 -- that way you might get garbage for the > non-ascii parts, but it wouldn't raise an exception and it round-trips > through encoding/decoding. And you would have a somewhat more useful subset > -- including the latin-language character and symbols like the degree > symbol, etc.
I'm not sure anymore, after all these threads I think bytes should be bytes and strings should be strings >>> x = np.array(['hugo'], 'S') Traceback (most recent call last): File "<pyshell#61>", line 1, in <module> x = np.array(['hugo'], float) ValueError: could not convert string to bytes: 'hugo' >>> x = np.array([b'hugo'], 'S') >>> but with support for textarrays as Oscars showed, to make it easy to convert between the 'S' and 'S:encoding' or use either view on the memory. I like the idea of an `encoding_view` on some 'S' bytes, and once we have a view like that there is no reason to pretend 'S' bytes are text. > >> >> or follow whatever the convention of numpy is: >> >> >>> s = -256 >> >>> np.array((s,), dtype=np.uint8)[0] == s >> False >> >>> s = -1 >> >>> np.array((s,), dtype=np.uint8)[0] == s >> False > > > I think text is distinct enough from numbers that we don't need to do that > same thing -- and this is result of well-defined casting rules built into > the compiler (and hardware?) for the numeric types. I dont hink we have > either the standard or compiler support for text conversions like that. > > -CHB > > PS: this is interesting, on py2: > > > In [176]: a = np.array((2222,), dtype='S') > > In [177]: a > Out[177]: > array(['2'], > dtype='|S1') > > It converts it to a string, but only grabs the first character? (is it > determining the size before converting to a string? I recently fixed a bug in statsmodels based on this. I don't know why the code worked before, I assume it used string integers instead of integers at some point when it was written > > and this: > > In [182]: a = np.array(2222, dtype='S') > > In [183]: a > Out[183]: > array('2222', > dtype='|S24') > > 24 ? where did that come from? No idea. Unless I missed something when I didn't pay attention, there never before was any discussion on the mailing list about bytes versus strings in python 3 in numpy (I don't follow numpy's "issues"). And I neither remember (m)any public complaints about the behavior of the 'S' type in strange cases. maybe I didn't pay attention because I didn't care, until we ran into the python 3 problems. maybe nobody else did either. Josef > > > > > > > > > > > >> >> >> Josef >> >> > >> > Key here is that we don't have the option of not breaking anything, >> > because >> > there is a lot already broken. >> > >> > -Chris >> > >> > >> > -- >> > >> > Christopher Barker, Ph.D. >> > Oceanographer >> > >> > Emergency Response Division >> > NOAA/NOS/OR&R (206) 526-6959 voice >> > 7600 Sand Point Way NE (206) 526-6329 fax >> > Seattle, WA 98115 (206) 526-6317 main reception >> > >> > chris.bar...@noaa.gov >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion@scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion