The Unicode standard is too complex for general purpose algorithms to do useful things on D strings. We don't see that however, since our writing systems are sufficiently well supported.
As an inspiration I'll leave a string here that contains combined characters in Korean (http://decodeunicode.org/hangul_syllables) and Latin as well as full width characters that span 2 characters in e.g. Latin, Greek or Cyrillic scripts (http://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms): Halfwidth / Fullwidth, ᆨᆨᆨᆚᆚᅱᅱᅱᅡᅓᅲᄄᆒᄋᆮ, a͢b 9͚ c̹̊ (I used the "unfonts" package for the Hangul part) What I want to say is that for correct Unicode handling we should either use existing libraries or get a feeling for what the Unicode standard provides, then form use cases out of it. For example when we talk about the length of a string we are actually talking about 4 different things: - number of code units - number of code points - number of user perceived characters - display width using a monospace font The same distinction applies for slicing, depending on use case. Related: - What normalization do D strings use. Both Linux and MacOS X use UTF-8, but the binary representation of non-ASCII file names is different. - How do we handle sorting strings? The topic matter is complex, but not difficult (as in rocket science). If we really want to find a solution, we should form an expert group and stop talking until we read the latest Unicode specs. They are a moving target. Don't expect to ever be "done" with full Unicode support in D. -- Marco
