My IRC bot is suddenly seeing crashes. It reads characters from a Socket into an ubyte[] array, then idups parts of that (full lines) into strings for parsing. Parsing involves slicing such strings into meaningful segments; sender, event type, target channel/user, message content, etc. I can assume all of them to be char[]-compliant except for the content field.

Running it in a debugger I see I'm tripping an assert in utf.d[1] when calling stripRight on a content slice[2].

/++
Returns the number of code units that are required to encode the code point
    $(D c) when $(D C) is the character type used to encode it.
  +/
ubyte codeLength(C)(dchar c) @safe pure nothrow @nogc
if (isSomeChar!C)
{
    static if (C.sizeof == 1)
    {
        if (c <= 0x7F) return 1;
        if (c <= 0x7FF) return 2;
        if (c <= 0xFFFF) return 3;
        if (c <= 0x10FFFF) return 4;
        assert(false);  // <--
    }
    // ...

This trips it:

import std.string;

void main()
{
string s = "\355\342\256 \342\245\341⮢\256\245 ᮮ\241饭\250\245".stripRight; // <-- asserts false
}

The real backtrace:
#0 _D3std3utf__T10codeLengthTaZQpFNaNbNiNfwZh (c=26663461) at /usr/include/dlang/dmd/std/utf.d:2530 #1 0x000055555578d7aa in _D3std6string__T10stripRightTAyaZQrFQhZ14__foreachbody2MFNaNbNiNfKmKwZi (this=0x7fffffff99c0, __applyArg1=@0x7fffffff9978: 26663461, __applyArg0=@0x7fffffff9970: 17) at /usr/include/dlang/dmd/std/string.d:2918 #2 0x00007ffff7a47014 in _aApplyRcd2 () from /usr/lib/libphobos2.so.0.78 #3 0x000055555578d731 in _D3std6string__T10stripRightTAyaZQrFNaNiNfQnZQq (str=...) at /usr/include/dlang/dmd/std/string.d:2915 #4 0x00005555558e0cc7 in _D8kameloso3irc17parseSpecialcasesFNaNfKSQBnQBh9IRCParserKSQCf7ircdefs8IRCEventKAyaZv (slice=..., event=...,parser=...) at source/kameloso/irc.d:1184


Should that not be an Exception, as it's based on input? I'm not sure where the character 26663461 came from. Even so, should it assert?

I don't know what to do right now. I'd like to avoid sanitizing all lines. I could catch an Exception but not so much an AssertError.


[1]: https://github.com/dlang/phobos/blob/master/std/utf.d#L2522
[2]: https://github.com/zorael/kameloso/blob/master/source/kameloso/irc.d#L1184

Reply via email to