On Tuesday, February 05, 2013 22:51:32 Andrei Alexandrescu wrote: > I think it would be reasonable for a lexer to require a range of ubyte > as input, and carry its own decoding. In the first approximation it may > even require a random-access range of ubyte.
Another big issue is the fact that in some ways, using a pointer like dmd's lexer does is actually superior to using a range. In particular, it's trivial to determine where in the text a token is, because you can simply subtract the pointer in the token from the initial pointer. Strings would be okay too, because you can subtract their ptr properties. But the closest that you'll get with ranges is to subtract their lengths, and the only ranges that are likely to define length are random-access ranges. And to do that, you'd either have to keep calculating the index for each token as its generated or save the range with ever token (rather than just having a pointer) so that you could determine the index later if you needed to. And depending on the range, all of that saving could be expensive. And for any other type of range, you'd literally have to count the code units as you iterated in order to figure out what the index is (and you'd have to keep saving the range as you went along if you wanted to slice it at all, since it wouldn't actually be sliceable, and so getting to a particular index in the range would be very expensive even if you kept track of it). And for syntax highlighting and some error reporting and a variety of other uses, you need to be able to determine where in the text a token was (not just its column and line number). And that's simply a lot easier with a pointer. So, dealing with generic ranges is a bit problematic - far more so than any issues with character types. If the lexer is well-written, the extra overhead had be obviated by having the lexer function do stuff a bit differently depending on the type of the range, but regardless, restricting it to strings or pointers would be cleaner in that regard. It's not quite a use case where ranges shine - especially when efficiency is a top priority. - Jonathan M Davis
