Crash in byCodeUnit() <- byDchar() when converting faulty text to HTML

Nordlöw Sun, 15 Jun 2014 16:11:26 -0700

I'm using the following snippet to convert a UTF-8 string to HTML


/** Convert character $(D c) to HTML representation. */
string toHTML(C)(C c) @safe pure if (isSomeChar!C)
{
    import std.conv: to;
    if      (c == '&')  return "&amp;"; // ampersand
    else if (c == '<')  return "&lt;"; // less than
    else if (c == '>')  return "&gt;"; // greater than
    else if (c == '\"') return "&quot;"; // double quote
    else if (0 < c && c < 128)
        return to!string(cast(char)c);
    else
        return "&#" ~ to!string(cast(int)c) ~ ";";
}

static if (__VERSION__ >= 2066L)
{
    /** Convert string $(D s) to HTML representation. */
    auto encodeHTML(string s) @safe pure
    {
        import std.utf: byDchar;
        import std.algorithm: joiner, map;
        return s.byDchar.map!toHTML.joiner("");
    }
}

Note that it uses Walter's new std.utf.byDchar.

But it triggers

core.exception.RangeError@std/utf.d(2703): Range violation
----------------
Stack trace:
#1: ?? line (0)
#2: ?? line (0)

#3:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/utf.dline (2703)#4:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/utf.dline (3232)#5:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.dline (510)#6:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.dline (3440)#7:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.dline (3540)#8:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/range.dline (1861)#9:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.dline (2172)#10:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.dline (2843)#11:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.dline (3167)#12:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.dline (526)#13:/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/stdio.dline (1168)


for non-utf-8 input.

Is this intentional?

utf.d on line 2703 is inside byCodeUnit().

When I use byChar() i doesn't crash but then I get incorrectconversions.

Could somebody explain the different between byChar, byWchar andbyDchar?

Crash in byCodeUnit() <- byDchar() when converting faulty text to HTML

Reply via email to