Has somebody written a SQL similar to this
CREATE DATABASE testdb DEFAULT CHARACTER SET utf8mb4 COLLATE
utf8mb4_unicode_ci;
What is the difference between UTF8 and UTF8MB4 ?
I need help to understand the memory implication of
- CHARACTER SET utf8mb4
- COLLATE utf8mb4_unicode_ci
Any pointers?
Since you didn't mention the db, I would assume you are talking about MySQL,
since they are ones I know that have the concept of UTF8MB4.
UTF8MB4 is MySQL's implementation of UTF-8, since the UTF-8 (which is acutally
- utf8mb3) in MySQL can only store upto 3 bytes at the max.
UTF-8(UTF8MB3) in mySQL cannot store all the unicode codes.
This is the understanding I have of UTF8MB4.
For character set UTF8MB4 - the implication in memory would be that you would
get upto 4bytes to store the code.
For collate utf8mb4_unicode_ci - it means for example: "abc" would be treated as "ABC",
the "ci" in utf8mb4_unicode_ci stands for Case insensitve.
There is better explaination here on Stackoverflow:
https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql
Does this help?
Warm regards
Ragini.