Our MySQL/MariaDB database interface class creates the database with utf8 character set, which is only capable of storing 3 bytes per character (utf8 is actually an alias for utf8mb3 charset in mysql)
I will create a ticket for changing the character set to utf8mb4 for mysql/mariadb. utf8mb4 supports 4 bytes per character, like PostgreSQL. Some references: https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql https://mariadb.com/kb/en/library/supported-character-sets-and-collations/ https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434 Am 23.01.2019 um 09:00 schrieb Markus Schuch: > Hi, > > while using MySQL/MariaDB for MCF i encountered a "deadlock" kind of > situation caused by a UTF-16 character (e.g. U+1F3AE) in a String > inserted in one of the varchar colums. > > In my case a connector wrote th title of a parent document in to the > version string of the process document, which contained the character > U+1F3AE - a gamepad :) > > This lead to SQL Error 22001 "Incorrect string value: '\xF0\x9F\x8E\xAE' > for column 'lastversion' at row 1" in mysql because the utf8 collation > encoding does not support that kind of chars. (utf8mb4 does) > > The cause was hard to find, because it somehow it lead to a transaction > abort loop in the incremental ingester and the error was not logged > properly. > > My question: > - should we create the mysql database with utf8mb4 by default? > - or should inserted strings be sanatized from UTF-16 chars? > - or should 22001 be handled better? > > Thanks in advance > Markus >
pEpkey.asc
Description: application/pgp-keys
