Re: TDPL: Foreach over Unicode string

Sean Kelly Tue, 27 Jul 2010 15:35:17 -0700

Andrej Mitrovic Wrote:

> On page 123 there's an example of what happens when traversing a unicode 
> string with a char, and on the next page the string is traversed with a 
> dchar, which should fix the output. But I'm getting different results, here's 
> the code and output of the two samples:
> 
> import std.stdio;
> 
> void main() {
>     string str = "Hall\u00E5, V\u00E4rld!";
>     foreach (c; str) {
>         write('[', c, ']');
>     }
>     writeln();
> }
> 
> Prints:
> [H][a][l][l][Ã][¥][,][ ][V][Ã][¤][r][l][d][!]
> 
> Second example:
> 
> import std.stdio;
> 
> void main() {
>     string str = "Hall\u00E5, V\u00E4rld!";
>     foreach (dchar c; str) {
>         write('[', c, ']');
>     }
>     writeln();
> }
> 
> Prints:
> [H][a][l][l][Ã¥][,][ ][V][Ã¤][r][l][d][!]
> 
> 
> The second example should print out:
> [H][a][l][l][å][,][ ][V][ä][r][l][d][!] 
> 
> This is on DMD 2.047 on Windows.


I think it's Windows integration that's the problem, on OSX I get:

[H][a][l][l][?][?][,][ ][V][?][?][r][l][d][!]
[H][a][l][l][å][,][ ][V][ä][r][l][d][!]

which is essentially correct.  The only difference between this and doing the 
same thing in C and using printf() in place of write() is that both lines 
display correctly in C.  I think printf() must be detecting partial UTF-8 
characters and buffering until the complete chunk has arrived.  Interestingly, 
the C output can't even be broken by badly timed calls to fflush(), so the 
buffering is happening at a fairly high level.  I'd be interested in seeing the 
same thing in write() at some point.

Re: TDPL: Foreach over Unicode string

Reply via email to