On Thu, 13 Mar 2025, RAGINI wrote:

 Has somebody written a SQL similar to this

 CREATE DATABASE testdb DEFAULT CHARACTER SET utf8mb4 COLLATE
 utf8mb4_unicode_ci;

 What is the difference between UTF8 and UTF8MB4 ?

 I need help to understand the memory implication of
 - CHARACTER SET utf8mb4
 - COLLATE utf8mb4_unicode_ci

 Any pointers?

Since you didn't mention the db, I would assume you are talking about MySQL, since they are ones I know that have the concept of UTF8MB4.

UTF8MB4 is MySQL's implementation of UTF-8, since the UTF-8 (which is acutally - utf8mb3) in MySQL can only store upto 3 bytes at the max.
UTF-8(UTF8MB3) in mySQL cannot store all the unicode codes.

This is the understanding I have of UTF8MB4.

For character set UTF8MB4 - the implication in memory would be that you would get upto 4bytes to store the code.

For collate utf8mb4_unicode_ci - it means for example: "abc" would be treated as "ABC", the "ci" in utf8mb4_unicode_ci stands for Case insensitve.

There is better explaination here on Stackoverflow:
https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql


Very helpful.

I didn't know that UTF8MB4 is native to MySQL.

Is this also supported on PostgreSQL ?

There can be a usecase of MySQL to PostgreSQL migration. Just a thought.


warm regards
Saifi.

Reply via email to