Re: Reading dchar from UTF-8 stdin

spir Wed, 16 Mar 2011 02:54:26 -0700

On 03/15/2011 11:33 PM, Ali Çehreli wrote:

Given that the input stream is UTF-8, it is understandable that the following
program pulls just one code unit from the standard input (I think the console
encoding is UTF-8 on my Ubuntu 10.10):


import std.stdio;

void main()
{
char code;
readf(" %s", &code);
writeln(code); // <-- may write an incomplete character
}

ö is represented by two bytes in the UTF-8 encoding. When ö is fed to the input
of the program, writeln expression does not produce a complete character on the
output. That's understandable with char.

Would you expect all of the bytes to be consumed when a dchar was used instead?

import std.stdio;

void main()
{
dchar code; // <-- now a dchar
readf(" %s", &code);
writeln(code); // <-- BUG: uses a code unit as a code point!
}

Well, when I try to run that bit of code, I get an error in std.format.formattedRead (line near the end, marked with "***" below).


void formattedRead(R, Char, S...)(ref R r, const(Char)[] fmt, S args)
{
    auto spec = FormatSpec!Char(fmt);
    static if (!S.length)
    {
        spec.readUpToNextSpec(r);
        enforce(spec.trailing.empty);
    }
    else
    {
        // The function below accounts for '*' == fields meant to be
        // read and skipped
        void skipUnstoredFields()
        {
            for (;;)
            {
                spec.readUpToNextSpec(r);
                if (spec.width != spec.DYNAMIC) break;
                // must skip this field
                skipData(r, spec);
            }
        }

        skipUnstoredFields();
        alias typeof(*args[0]) A;
        static if (isTuple!A)
        {
            foreach (i, T; A.Types)
            {
                //writeln("Parsing ", r, " with format ", fmt);
                (*args[0])[i] = unformatValue!(T)(r, spec);
                skipUnstoredFields();
            }
        }
        else
        {
            *args[0] = unformatValue!(A)(r, spec);              // ***
        }
        return formattedRead(r, spec.trailing, args[1 .. $]);
    }
}

When the input is ö, now the output becomes Ã.

What would you expect to happen?


I would expect a whole code representing 'ö'.

Ali

P.S. As what is written is not the same as what is read above, I am reminded of
another issue: would you expect the strings "false" and "true" to be accepted
as correct inputs when readf'ed to bool variables?


Yep!

Denis
--
_________________
vita es estrany
spir.wikidot.com

Re: Reading dchar from UTF-8 stdin

Reply via email to