On Thu, 13 Mar 2025, RAGINI wrote:
Has somebody written a SQL similar to this
CREATE DATABASE testdb DEFAULT CHARACTER SET utf8mb4 COLLATE
utf8mb4_unicode_ci;
What is the difference between UTF8 and UTF8MB4 ?
I need help to understand the memory implication of
- CHARACTER SET utf8mb4
- COLLATE utf8mb4_unicode_ci
Any pointers?
Since you didn't mention the db, I would assume you are talking about MySQL,
since they are ones I know that have the concept of UTF8MB4.
UTF8MB4 is MySQL's implementation of UTF-8, since the UTF-8 (which is
acutally - utf8mb3) in MySQL can only store upto 3 bytes at the max.
UTF-8(UTF8MB3) in mySQL cannot store all the unicode codes.
This is the understanding I have of UTF8MB4.
For character set UTF8MB4 - the implication in memory would be that you would
get upto 4bytes to store the code.
For collate utf8mb4_unicode_ci - it means for example: "abc" would be treated
as "ABC", the "ci" in utf8mb4_unicode_ci stands for Case insensitve.
There is better explaination here on Stackoverflow:
https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql
Very helpful.
I didn't know that UTF8MB4 is native to MySQL.
Is this also supported on PostgreSQL ?
There can be a usecase of MySQL to PostgreSQL migration. Just a thought.
warm regards
Saifi.