Our MySQL/MariaDB database interface class creates the database with
utf8 character set, which is only capable of storing 3 bytes per
character (utf8 is actually an alias for utf8mb3 charset in mysql)

I will create a ticket for changing the character set to utf8mb4 for
mysql/mariadb. utf8mb4 supports 4 bytes per character, like PostgreSQL.

Some references:

https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql

https://mariadb.com/kb/en/library/supported-character-sets-and-collations/

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html

https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434


Am 23.01.2019 um 10:07 schrieb Karl Wright:
> It's critical, with Manifold, that the database instance be capable of
> handling any characters it's likely to encounter.  For Postgresql we tell
> people to install it with the utf-8 collation, for instance, and when we
> create database instances ourselves we try to specify that as well.  For
> MariaDB, have a look at the database implementation we've got, and let me
> know if this is something we're missing anywhere?
> 
> Thanks,
> Karl
> 
> 
> On Wed, Jan 23, 2019 at 3:00 AM Markus Schuch <[email protected]> wrote:
> 
>> Hi,
>>
>> while using MySQL/MariaDB for MCF i encountered a "deadlock" kind of
>> situation caused by a UTF-16 character (e.g. U+1F3AE) in a String
>> inserted in one of the varchar colums.
>>
>> In my case a connector wrote th title of a parent document in to the
>> version string of the process document, which contained the character
>> U+1F3AE - a gamepad :)
>>
>> This lead to SQL Error 22001 "Incorrect string value: '\xF0\x9F\x8E\xAE'
>> for column 'lastversion' at row 1" in mysql because the utf8 collation
>> encoding does not support that kind of chars. (utf8mb4 does)
>>
>> The cause was hard to find, because it somehow it lead to a transaction
>> abort loop in the incremental ingester and the error was not logged
>> properly.
>>
>> My question:
>> - should we create the mysql database with utf8mb4 by default?
>> - or should inserted strings be sanatized from UTF-16 chars?
>> - or should 22001 be handled better?
>>
>> Thanks in advance
>> Markus
>>
> 

Attachment: pEpkey.asc
Description: application/pgp-keys

Reply via email to