Our MySQL/MariaDB database interface class creates the database with
utf8 character set, which is only capable of storing 3 bytes per
character (utf8 is actually an alias for utf8mb3 charset in mysql)

I will create a ticket for changing the character set to utf8mb4 for
mysql/mariadb. utf8mb4 supports 4 bytes per character, like PostgreSQL.

Some references:

https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql

https://mariadb.com/kb/en/library/supported-character-sets-and-collations/

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html

https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434

Am 23.01.2019 um 09:00 schrieb Markus Schuch:
> Hi,
> 
> while using MySQL/MariaDB for MCF i encountered a "deadlock" kind of
> situation caused by a UTF-16 character (e.g. U+1F3AE) in a String
> inserted in one of the varchar colums.
> 
> In my case a connector wrote th title of a parent document in to the
> version string of the process document, which contained the character
> U+1F3AE - a gamepad :)
> 
> This lead to SQL Error 22001 "Incorrect string value: '\xF0\x9F\x8E\xAE'
> for column 'lastversion' at row 1" in mysql because the utf8 collation
> encoding does not support that kind of chars. (utf8mb4 does)
> 
> The cause was hard to find, because it somehow it lead to a transaction
> abort loop in the incremental ingester and the error was not logged
> properly.
> 
> My question:
> - should we create the mysql database with utf8mb4 by default?
> - or should inserted strings be sanatized from UTF-16 chars?
> - or should 22001 be handled better?
> 
> Thanks in advance
> Markus
> 

Attachment: pEpkey.asc
Description: application/pgp-keys

Reply via email to