Am Wed, 01 Aug 2012 19:58:46 +0200
schrieb "Brian Schott" <[email protected]>:

> On Wednesday, 1 August 2012 at 17:36:16 UTC, Walter Bright wrote:
> >
> > I suggest proposing the D lexer as an addition to Phobos. But 
> > if that is done, its interface would need to accept a range as 
> > input, and its output should be a range of tokens.
> 
> It used to be range-based, but the performance was terrible. The 
> inability to use slicing on a forward-range of characters and the 
> gigantic block on KCachegrind labeled "std.utf.decode" were the 
> reasons that I chose this approach. I wish I had saved the 
> measurements on this....

I can understand you. I was reading a dictionary file with 
readText().splitLines(); and wondering why a unicode decoding was performed. 
Unfortunately ranges work on Unicode units and all structured text files are 
structured by ASCII characters. While these file formats probably just old or 
done with some false sense of compatibility in mind, it is also clear to their 
inventors, that parsing them is easier and faster with single-byte characters 
to delimit tokens.
But we have talked about UTF-8 vs. ASCII and foreach vs. ranges before. I still 
hope for some super-smart solution, that doesn't need a book of documentation 
and allows some kind of ASCII-equivalent range. I've heard that foreach over 
UTF-8 with a dchar loop variable, does an implicit decoding of the UTF-8 
string. While this is useful it is also not self-explanatory and needs some 
reading into the topic.

-- 
Marco

Reply via email to