On Wed, Apr 26, 2017 at 2:31 PM, Nathaniel Smith <n...@pobox.com> wrote: > On Apr 26, 2017 9:30 AM, "Chris Barker - NOAA Federal" > <chris.bar...@noaa.gov> wrote: > > > UTF-8 does not match the character-oriented Python text model. Plenty > of people argue that that isn't the "correct" model for Unicode text > -- maybe so, but it is the model python 3 has chosen. I wrote a much > longer rant about that earlier. > > So I think the easy to access, and particularly defaults, numpy string > dtypes should match it. > > > This seems a little vague? The "character-oriented Python text model" is > just that str supports O(1) indexing of characters. But... Numpy doesn't. If > you want to access individual characters inside a string inside an array, > you have to pull out the scalar first, at which point the data is copied and > boxed into a Python object anyway, using whatever representation the > interpreter prefers. So AFAICT it makes literally no difference to the user > whether numpy's internal representation allows for fast character access.
you can create a view on individual characters or bytes, AFAICS >>> t = np.array(['abcdefg']*10) >>> t2 = t.view([('s%d' % i, '<U1') for i in range(7)]) >>> t2['s5'] array(['f', 'f', 'f', 'f', 'f', 'f', 'f', 'f', 'f', 'f'], dtype='<U1') >>> t.view('<U1').reshape(len(t), -1)[:, 2] array(['c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c'], dtype='<U1') Josef > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion