On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
[snip]
Using std.algorithm or std.range requires learning about ranges. You shouldn't be surprised that string handling with ranges works differently from specialized string handling functions, which is the norm in most languages. For anyone with even a cursory knowledge of ranges and range algorithms, it's no surprise when the result of a range composition is not of string type even when the input is a string.
If you don't want to learn about ranges, use std.string. If std.string is not sufficient, then you should consider learning about ranges, which means accepting that yes, things will be different. Learning about ranges and how to use them for string manipulation is not the easiest thing right now due to a dearth of learning material, but that's not a problem with ranges. Compiler error messages are indeed part of the problem, but they are a WIP. 2.065 contains an incremental improvement to error messages on failure of overload resolution (Thanks Kenji).
About Unicode, the unit that the language promotes and the standard library embraces is `dchar`, the Unicode code point. The choice of not using graphemes is a compromise between correctness and performance. That means that the onus is still on the user to cover the last mile of correctness, so the user is not exempt from having to learn at least the basics of Unicode in order to write Unicode-correct code in D. However, this is a surprisingly reasonable compromise: as long as all inputs are normalized to the same format (which may require std.uni.normalize if the source of the input does not guarantee a particular format), then outside of contrived examples it's very hard to break grapheme clusters by using range-based code, even though they are ranges of code points. Explicit handling of graphemes is typically only needed for very specific domains, like if you're writing a text rendering library or a text input box etc. Thus typical range-based string manipulation tends to be correct even for multi-code-point graphemes, without the author having to consciously handle it.
2.065 has std.uni.byGrapheme/byCodePoint for range-based grapheme manipulation. However, there is a performance cost involved so I recommend against using it dogmatically. The result of `byGrapheme` is not bidirectional yet - someone needs to take the time to implement `decodeGraphemeBack` and/or `graphemeStrideBack` first.
