Charles R Harris wrote: > That is due to type promotion for the ufunc call: > > In [17]: a1 = np.array('a\x00\x00\x00') > > n [21]: np.array(['a'], dtype=a1.dtype)[0] > Out[21]: 'a' > > In [22]: np.array(['a'], dtype=a1.dtype).tostring() > Out[22]: 'a\x00\x00\x00'
it took me a bit to figure out what this meant, so in case I'm not the only one, I thought I'd spell it out: In [3]: s1 = np.array('a') In [4]: s1.dtype Out[4]: dtype('|S1') so s1's dytype is a length-1 string In [11]: s2 = np.array('a\x00\x00') In [12]: s2.dtype Out[12]: dtype('|S3') and s2's is a length-3 string In [13]: s1 == s2 Out[13]: array(True, dtype=bool) when they are compared, s1's dtype is coerced to a length 3 string by padding with nulls, and thus they compare equal. otherwise, there is nothing special about zero bytes in a string: In [14]: s3 = np.array('\x00a\x00') In [15]: s3 == s2 Out[15]: array(False, dtype=bool) In [16]: s3 == s1 Out[16]: array(False, dtype=bool) The problem is that there is zero bytes are the only way to pad a string. I suppose the comparison could be smarter, by comparing without coercing, but that may not be possible without the ufunc machinery. As for printing, I think it simply reflects that numpy strings are null padded, and most people probably wouldn't want to see all those nulls every time. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion