Re: Should this work?

Jakob Ovrum Thu, 09 Jan 2014 17:51:04 -0800

On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:

[snip]

Using std.algorithm or std.range requires learning about ranges.You shouldn't be surprised that string handling with ranges worksdifferently from specialized string handling functions, which isthe norm in most languages. For anyone with even a cursoryknowledge of ranges and range algorithms, it's no surprise whenthe result of a range composition is not of string type even whenthe input is a string.

If you don't want to learn about ranges, use std.string. Ifstd.string is not sufficient, then you should consider learningabout ranges, which means accepting that yes, things will bedifferent. Learning about ranges and how to use them for stringmanipulation is not the easiest thing right now due to a dearthof learning material, but that's not a problem with ranges.Compiler error messages are indeed part of the problem, but theyare a WIP. 2.065 contains an incremental improvement to errormessages on failure of overload resolution (Thanks Kenji).

About Unicode, the unit that the language promotes and thestandard library embraces is `dchar`, the Unicode code point. Thechoice of not using graphemes is a compromise between correctnessand performance. That means that the onus is still on the user tocover the last mile of correctness, so the user is not exemptfrom having to learn at least the basics of Unicode in order towrite Unicode-correct code in D. However, this is a surprisinglyreasonable compromise: as long as all inputs are normalized tothe same format (which may require std.uni.normalize if thesource of the input does not guarantee a particular format), thenoutside of contrived examples it's very hard to break graphemeclusters by using range-based code, even though they are rangesof code points. Explicit handling of graphemes is typically onlyneeded for very specific domains, like if you're writing a textrendering library or a text input box etc. Thus typicalrange-based string manipulation tends to be correct even formulti-code-point graphemes, without the author having toconsciously handle it.

2.065 has std.uni.byGrapheme/byCodePoint for range-based graphememanipulation. However, there is a performance cost involved so Irecommend against using it dogmatically. The result of`byGrapheme` is not bidirectional yet - someone needs to take thetime to implement `decodeGraphemeBack` and/or`graphemeStrideBack` first.

Re: Should this work?

Reply via email to