On Thursday, 6 September 2018 at 10:22:22 UTC, ag0aep6g wrote:
On 09/06/2018 09:23 AM, Chris wrote:
Python 3 gives me this:

print(len("á"))
1

Python 3 also gives you this:

print(len("á"))
2

(The example might not survive transfer from me to you if Unicode normalization happens along the way.)

That's when you enter the 'á' as 'a' followed by U+0301 (combining acute accent). So Python's `len` counts in code points, like D's std.range does (auto-decoding).

To avoid this you have to normalize and recompose any decomposed characters. I remember that Mac OS X used (and still uses?) decomposed characters by default, so when you typed 'á' into your cli, it would automatically decompose it to 'a' + acute. `string` however returns len=2 for composed characters too. If you do a lot of string handling it will come back to bite you sooner or later.

Reply via email to