Re: std.utf.decode behaves unexpectedly - Bug?

Spacen Jasset via Digitalmars-d-learn Fri, 06 Nov 2015 11:45:36 -0800

On Friday, 6 November 2015 at 19:26:50 UTC, HeiHon wrote:

Consider this:


[code]
import std.stdio, std.utf, std.exception;

void do_decode(string txt)
{
    try
    {
        size_t idx;
        writeln("decode ", txt);
        for (size_t i = 0; i < txt.length; i++)
        {
            dchar dc = std.utf.decode(txt[i..i+1], idx);

writeln(" i=", i, " length=", txt[i..i+1].length, "char=", txt[i], " idx=", idx, " dchar=", dc);

        }
    }
    catch(Exception e)
    {
        writeln(e.msg, " file=", e.file, " line=", e.line);
    }
    writeln();
}

void main()
{
    do_decode("abc");
/+ result:
decode abc
 i=0 length=1 char=a idx=1 dchar=a
 i=1 length=1 char=b idx=2 dchar=c
 i=2 length=1 char=c idx=3 dchar=
+/

    do_decode("åbc");
/+ result:
decode åbc

Attempted to decode past the end of a string (at index 1)file=D:\dmd2\windows\bin\..\..\src\phobos\std\utf.d line=1268

+/

    do_decode("aåb");
/+ result:
decode aåb
 i=0 length=1 char=a idx=1 dchar=a
core.exception.RangeError@std\utf.d(1265): Range violation
----------------
0x004054D4
0x0040214F
0x004045A7
0x004044BB
0x00403008
0x755D339A in BaseThreadInitThunk
0x76EE9EF2 in RtlInitializeExceptionChain
0x76EE9EC5 in RtlInitializeExceptionChain
+/
}
[/code]

I would expect:
decode abc -> dchar a, dchar b, dchar c
decode åbc -> dchar å, dchar b, dchar c
decode aåb -> dchar a, dchar å, dchar b

Am I using std.utf.decode wrongly or is it buggy?


I wouldn't have thought you would want to do this:

  dchar dc = std.utf.decode(txt[i..i+1], idx);

since txt is utf8, and this is a multiple byte, and variablelength encoding, so txt[i..i+1] won't work, you will end up withinvalid chops of utf8.

It would seem that you might want to just say decode(txt, i)instead if you look at the documentation it should decode onecode point and advance i the right amount of characters forward.In other words, perhaps that paired with a while ( i <txt.length) might do the trick.

Re: std.utf.decode behaves unexpectedly - Bug?

Reply via email to