Re: [freedom] UTF8 vs UTF8MB4

Saifi Khan Thu, 13 Mar 2025 13:11:48 -0700

On Thu, 13 Mar 2025, RAGINI wrote:

 Has somebody written a SQL similar to this
 CREATE DATABASE testdb DEFAULT CHARACTER SET utf8mb4 COLLATE
 utf8mb4_unicode_ci;

 What is the difference between UTF8 and UTF8MB4 ?

 I need help to understand the memory implication of
 - CHARACTER SET utf8mb4
 - COLLATE utf8mb4_unicode_ci

 Any pointers?
Since you didn't mention the db, I would assume you are talking about MySQL,since they are ones I know that have the concept of UTF8MB4.
UTF8MB4 is MySQL's implementation of UTF-8, since the UTF-8 (which isacutally - utf8mb3) in MySQL can only store upto 3 bytes at the max.
UTF-8(UTF8MB3) in mySQL cannot store all the unicode codes.

This is the understanding I have of UTF8MB4.
For character set UTF8MB4 - the implication in memory would be that you wouldget upto 4bytes to store the code.
For collate utf8mb4_unicode_ci - it means for example: "abc" would be treatedas "ABC", the "ci" in utf8mb4_unicode_ci stands for Case insensitve.
There is better explaination here on Stackoverflow:
https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql


Very helpful.

I didn't know that UTF8MB4 is native to MySQL.

Is this also supported on PostgreSQL ?

There can be a usecase of MySQL to PostgreSQL migration. Just a thought.


warm regards
Saifi.

Re: [freedom] UTF8 vs UTF8MB4

Reply via email to