Re: Read a unicode character from the terminal

Ali Çehreli Sat, 31 Mar 2012 11:54:43 -0700

On 03/31/2012 08:56 AM, Jacob Carlborg wrote:
> How would I read a unicode character from the terminal? I've tried using
> "std.cstream.din.getc"

I recommend using stdin. The destiny of std.cstream is uncertain andstdin is sufficient. (I know that it lacks support for BOM but I don'tneed them.)


> but it seems to only work for ascii characters.
> If I try to read and print something that isn't ascii, it just prints a
> question mark.

The word 'character' used to mean characters of the Latin-basedalphabets but with Unicode support that's not the case anymore. In D,'character' means UTF code unit, nothing else. Unfortunately, although'Unidode character' is just the correct term to use, it conflicts withD's characters which are not Unicode characters.

'Unicode code point' is the non-conflicting term that matches what wemean with 'Unicode character.' Only dchar can hold code points.


That's the part about characters.

The other side is what is being fed into the program through itsstandard input. On my Linux consoles, the text comes as a stream ofchars, i.e. a UTF-8 encoded text. You must ensure that your terminal iscapable of supporting Unicode through its settings. On Windowsterminals, one must enter 'chcp 65001' to set the terminal to UTF-8.

Then, it is the program that must know what the data represents. If youare expecting a Unicode code point, then you may think that is should beas simple as reading into a dchar:


import std.stdio;

void main()
{
    dchar letter;
    readf("%s", &letter);    // <-- does not work!
    writeln(letter);
}

The output:

$ ./deneme
ç
Ã  <-- will be different on different consoles

The problem is, char can implicitly be converted to dchar. Since theletter ç consists of two chars (two UTF-8 code units), dchar gets thefirst one converted as a dchar.

To see this, read and write two chars in a loop without a newline inbetween:


import std.stdio;

void main()
{
    foreach (i; 0 .. 2) {
        char code;
        readf("%s", &code);
        write(code);
    }

    writeln();
}

This time two code units are read and then outputted to form a Unicodecharacter on the console:


$ ./deneme
ç
ç   <-- result of two write(code) expressions

The solution is to use ranges when pulling Unicode characters out ofstrings. std.stdin does not provide this yet, but it will eventuallyhappen (so I've heard :)).


For now, this is a way of getting Unicode characters from the input:

import std.stdio;

void main()
{
    string line = readln();

    foreach (dchar c; line) {
        writeln(c);
    }
}

Once you have the input as a string, std.utf.decode can also be used.

Ali

Re: Read a unicode character from the terminal

Reply via email to