On Wednesday, October 24, 2012 12:42:59 mist wrote: > On Tuesday, 23 October 2012 at 17:36:53 UTC, Simen Kjaeraas wrote: > > On 2012-10-23, 19:21, mist wrote: > >> Hm, and all phobos functions should operate on narrow strings > >> as if they where not random-acessible? I am thinking about > >> something like commonPrefix from std.algorithm, which operates > >> on code points for strings. > > > > Preferably, yes. If there are performance (or other) benefits > > from > > operating on code units, and it's just as safe, then operating > > on code > > units is ok. > > Probably I don't undertsand it fully, but D approach has always > been "safe first, fast with some additional syntax". Back to > commonPrefix and take: > > ========================== > import std.stdio, std.traits, std.algorithm, std.range; > > void main() > { > auto beer = "Пиво"; > auto r1 = beer.take(2); > auto pony = "Пони"; > auto r2 = commonPrefix(beer, pony); > writeln(r1); > writeln(r2); > } > ========================== > > First one returns 2 symbols. Second one - 3 code points and > broken string. There is no way such incosistency by-default in > standard library is understandable by a newbie.
We don't really have much choice here. As long as strings are arrays of code units, it wouldn't work to treat them as ranges of their elements, because that would be a complete disaster for unicode. You'd be operating on code units rather than code points, which is almost always wrong. Pretty much the only way to really solve the problem as long as strings are arrays with all of the normal array operations is for the std.range traits (hasLength, hasSlicing, etc.) and the range functions for arrays in std.array (e.g. front, popFront, etc.) to treat strings as ranges of code points (dchar), which is what they do. The result _is_ confusing, but as long as strings are arrays of code units like they are now, to do anything else would result in incorrect behavior. There just isn't a good solution given what strings currently are in the language itself. Andrei's suggestion would work if Walter could be talked into it, but that doesn't look like it's going to happen. And making it so that strings are structs which hold arrays of code units could work, but without language support, it's likely to have major issues. String literals would have to become the struct type, which could cause issue with calling C functions, and the code breakage would be _way_ larger than with Andrei's suggestion, since arrays of code units would no longer be strings at all. It would be feasible, but it gets really messy. What we have is probably about the best that we can do without actually changing the language (and Andrei's suggestion is likely the best way to do that IMHO), but that's unlikely to happen at this point, especilaly since Walter seems to view unicode quite differently from your average programmer and expects your average programmer to actually understand it and handle correctly (which just isn't going to happen). The confusion could be reduced if we not only had an article on dlang.org explaining exactly what ranges were and how to use them with Phobos but also an article (maybe the same one, maybe another), which explained what this means for strings and why. That way, it would become easier to become educated. But no one has written (or at least finished writing) such an article for dlang.org (I keep meaning to, but I never get around to it). Some stuff has been written outside of dlang.org (e.g. http://www.drdobbs.com/architecture- and-design/component-programming-in-d/240008321 and http://ddili.org/ders/d.en/ranges.html ), but there's nothing on dlang.org, and I don't believe that there's really anything online aside from stray newsgroup posts or stackoverflow answers which discusses why strings are the way they are with regards to ranges. And there should be. - Jonathan M Davis