I'm using the following snippet to convert a UTF-8 string to HTML

/** Convert character $(D c) to HTML representation. */
string toHTML(C)(C c) @safe pure if (isSomeChar!C)
{
    import std.conv: to;
    if      (c == '&')  return "&"; // ampersand
    else if (c == '<')  return "&lt;"; // less than
    else if (c == '>')  return "&gt;"; // greater than
    else if (c == '\"') return "&quot;"; // double quote
    else if (0 < c && c < 128)
        return to!string(cast(char)c);
    else
        return "&#" ~ to!string(cast(int)c) ~ ";";
}

static if (__VERSION__ >= 2066L)
{
    /** Convert string $(D s) to HTML representation. */
    auto encodeHTML(string s) @safe pure
    {
        import std.utf: byDchar;
        import std.algorithm: joiner, map;
        return s.byDchar.map!toHTML.joiner("");
    }
}

Note that it uses Walter's new std.utf.byDchar.

But it triggers

core.exception.RangeError@std/utf.d(2703): Range violation
----------------
Stack trace:
#1: ?? line (0)
#2: ?? line (0)
#3: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/utf.d line (2703) #4: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/utf.d line (3232) #5: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d line (510) #6: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d line (3440) #7: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d line (3540) #8: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/range.d line (1861) #9: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d line (2172) #10: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d line (2843) #11: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d line (3167) #12: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d line (526) #13: /home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/stdio.d line (1168)

for non-utf-8 input.

Is this intentional?

utf.d on line 2703 is inside byCodeUnit().

When I use byChar() i doesn't crash but then I get incorrect conversions.

Could somebody explain the different between byChar, byWchar and byDchar?

Reply via email to