On Friday, 6 November 2015 at 19:26:50 UTC, HeiHon wrote:
Consider this:

[code]
import std.stdio, std.utf, std.exception;

void do_decode(string txt)
{
    try
    {
        size_t idx;
        writeln("decode ", txt);
        for (size_t i = 0; i < txt.length; i++)
        {
            dchar dc = std.utf.decode(txt[i..i+1], idx);
writeln(" i=", i, " length=", txt[i..i+1].length, " char=", txt[i], " idx=", idx, " dchar=", dc);
        }
    }
    catch(Exception e)
    {
        writeln(e.msg, " file=", e.file, " line=", e.line);
    }
    writeln();
}

void main()
{
    do_decode("abc");
/+ result:
decode abc
 i=0 length=1 char=a idx=1 dchar=a
 i=1 length=1 char=b idx=2 dchar=c
 i=2 length=1 char=c idx=3 dchar=
+/

    do_decode("åbc");
/+ result:
decode åbc
Attempted to decode past the end of a string (at index 1) file=D:\dmd2\windows\bin\..\..\src\phobos\std\utf.d line=1268
+/

    do_decode("aåb");
/+ result:
decode aåb
 i=0 length=1 char=a idx=1 dchar=a
core.exception.RangeError@std\utf.d(1265): Range violation
----------------
0x004054D4
0x0040214F
0x004045A7
0x004044BB
0x00403008
0x755D339A in BaseThreadInitThunk
0x76EE9EF2 in RtlInitializeExceptionChain
0x76EE9EC5 in RtlInitializeExceptionChain
+/
}
[/code]

I would expect:
decode abc -> dchar a, dchar b, dchar c
decode åbc -> dchar å, dchar b, dchar c
decode aåb -> dchar a, dchar å, dchar b

Am I using std.utf.decode wrongly or is it buggy?

I wouldn't have thought you would want to do this:

  dchar dc = std.utf.decode(txt[i..i+1], idx);

since txt is utf8, and this is a multiple byte, and variable length encoding, so txt[i..i+1] won't work, you will end up with invalid chops of utf8.

It would seem that you might want to just say decode(txt, i) instead if you look at the documentation it should decode one code point and advance i the right amount of characters forward. In other words, perhaps that paired with a while ( i < txt.length) might do the trick.


Reply via email to