Re: [freedom] UTF8 vs UTF8MB4

RAGINI Thu, 13 Mar 2025 04:30:40 -0700

Has somebody written a SQL similar to this

CREATE DATABASE testdb DEFAULT CHARACTER SET utf8mb4 COLLATEutf8mb4_unicode_ci;


What is the difference between UTF8 and UTF8MB4 ?

I need help to understand the memory implication of
- CHARACTER SET utf8mb4
- COLLATE utf8mb4_unicode_ci

Any pointers?


Since you didn't mention the db, I would assume you are talking about MySQL, 
since they are ones I know that have the concept of UTF8MB4.

UTF8MB4 is MySQL's implementation of UTF-8, since the UTF-8 (which is acutally 
- utf8mb3) in MySQL can only store upto 3 bytes at the max.
UTF-8(UTF8MB3) in mySQL cannot store all the unicode codes.

This is the understanding I have of UTF8MB4.

For character set UTF8MB4 - the implication in memory would be that you would 
get upto 4bytes to store the code.

For collate utf8mb4_unicode_ci - it means for example: "abc" would be treated as "ABC", 
the "ci" in utf8mb4_unicode_ci stands for Case insensitve.

There is better explaination here on Stackoverflow:
https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql

Does this help?

Warm regards
Ragini.

Re: [freedom] UTF8 vs UTF8MB4

Reply via email to