My IRC bot is suddenly seeing crashes. It reads characters from a
Socket into an ubyte[] array, then idups parts of that (full
lines) into strings for parsing. Parsing involves slicing such
strings into meaningful segments; sender, event type, target
channel/user, message content, etc. I can assume all of them to
be char[]-compliant except for the content field.
Running it in a debugger I see I'm tripping an assert in utf.d[1]
when calling stripRight on a content slice[2].
/++
Returns the number of code units that are required to
encode the code point
$(D c) when $(D C) is the character type used to encode it.
+/
ubyte codeLength(C)(dchar c) @safe pure nothrow @nogc
if (isSomeChar!C)
{
static if (C.sizeof == 1)
{
if (c <= 0x7F) return 1;
if (c <= 0x7FF) return 2;
if (c <= 0xFFFF) return 3;
if (c <= 0x10FFFF) return 4;
assert(false); // <--
}
// ...
This trips it:
import std.string;
void main()
{
string s = "\355\342\256 \342\245\341⮢\256\245
ᮮ\241饭\250\245".stripRight; // <-- asserts false
}
The real backtrace:
#0 _D3std3utf__T10codeLengthTaZQpFNaNbNiNfwZh (c=26663461) at
/usr/include/dlang/dmd/std/utf.d:2530
#1 0x000055555578d7aa in
_D3std6string__T10stripRightTAyaZQrFQhZ14__foreachbody2MFNaNbNiNfKmKwZi (this=0x7fffffff99c0, __applyArg1=@0x7fffffff9978: 26663461, __applyArg0=@0x7fffffff9970: 17) at /usr/include/dlang/dmd/std/string.d:2918
#2 0x00007ffff7a47014 in _aApplyRcd2 () from
/usr/lib/libphobos2.so.0.78
#3 0x000055555578d731 in
_D3std6string__T10stripRightTAyaZQrFNaNiNfQnZQq (str=...) at
/usr/include/dlang/dmd/std/string.d:2915
#4 0x00005555558e0cc7 in
_D8kameloso3irc17parseSpecialcasesFNaNfKSQBnQBh9IRCParserKSQCf7ircdefs8IRCEventKAyaZv (slice=..., event=...,parser=...) at source/kameloso/irc.d:1184
Should that not be an Exception, as it's based on input? I'm not
sure where the character 26663461 came from. Even so, should it
assert?
I don't know what to do right now. I'd like to avoid sanitizing
all lines. I could catch an Exception but not so much an
AssertError.
[1]: https://github.com/dlang/phobos/blob/master/std/utf.d#L2522
[2]:
https://github.com/zorael/kameloso/blob/master/source/kameloso/irc.d#L1184