On Sunday, 9 March 2014 at 21:38:06 UTC, Nick Sabalausky wrote:
On 3/9/2014 7:47 AM, w0rp wrote:

My knowledge of Unicode pretty much just comes from having
to deal with foreign language customers and discovering the problems with the code unit abstraction most languages seem to use. (Java and Python suffer from similar issues, but they don't really have algorithms
in the way that we do.)


Python 2 or 3 (out of curiosity)? If you're including Python3, then that somewhat surprises me as I thought greatly improved Unicode was one of the biggest reasons for the jump from 2 to 3. (Although it isn't *completely* surprising since, as we all know far too well here, fully correct Unicode is *not* easy.)

Late reply here. Python 3 is a lot better in terms of Unicode support than 2. The situation in Python 2 was this.

1. The default string type is 'str', an immutable array of bytes.
2. 'str' could be one of many encodings, including UTF-16, etc.
3. There is an extra 'unicode' type for when you want a Unicode string. 4. Python implicltly converts between the two, often in wrong ways, often causing exceptions to appear where you didn't expect them to.

In 3, this changed to...

1. The default string type is still named 'str', only now it's like the 'unicode' of olde. 2. 'bytes' is a new immutable array of bytes type like the Python 2 'str'.
3. Conversion between 'str' and 'bytes' is always explicit.

However, Python 3 works on a code point level, probably some code unit level in fact, and you don't see very many algorithms which take, say, combining characters into account. So Python suffers from similar issues.

Reply via email to