Our MySQL/MariaDB database interface class creates the database with utf8 character set, which is only capable of storing 3 bytes per character (utf8 is actually an alias for utf8mb3 charset in mysql)
I will create a ticket for changing the character set to utf8mb4 for mysql/mariadb. utf8mb4 supports 4 bytes per character, like PostgreSQL. Some references: https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql https://mariadb.com/kb/en/library/supported-character-sets-and-collations/ https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434 Am 23.01.2019 um 10:07 schrieb Karl Wright: > It's critical, with Manifold, that the database instance be capable of > handling any characters it's likely to encounter. For Postgresql we tell > people to install it with the utf-8 collation, for instance, and when we > create database instances ourselves we try to specify that as well. For > MariaDB, have a look at the database implementation we've got, and let me > know if this is something we're missing anywhere? > > Thanks, > Karl > > > On Wed, Jan 23, 2019 at 3:00 AM Markus Schuch <[email protected]> wrote: > >> Hi, >> >> while using MySQL/MariaDB for MCF i encountered a "deadlock" kind of >> situation caused by a UTF-16 character (e.g. U+1F3AE) in a String >> inserted in one of the varchar colums. >> >> In my case a connector wrote th title of a parent document in to the >> version string of the process document, which contained the character >> U+1F3AE - a gamepad :) >> >> This lead to SQL Error 22001 "Incorrect string value: '\xF0\x9F\x8E\xAE' >> for column 'lastversion' at row 1" in mysql because the utf8 collation >> encoding does not support that kind of chars. (utf8mb4 does) >> >> The cause was hard to find, because it somehow it lead to a transaction >> abort loop in the incremental ingester and the error was not logged >> properly. >> >> My question: >> - should we create the mysql database with utf8mb4 by default? >> - or should inserted strings be sanatized from UTF-16 chars? >> - or should 22001 be handled better? >> >> Thanks in advance >> Markus >> >
pEpkey.asc
Description: application/pgp-keys
