Also, I believe for char(x) character set utf8, the disk storage is 4 bytes per UTF8 character (not 3), as stored in the meta-data table COLUMNS, to cover
You can insert a utf8 character of 4 bytes into either version of the column. --Qifan >>showddl tao2; CREATE TABLE TRAFODION.SEABASE.TAO2 ( A CHAR(10 CHARS) CHARACTER SET UTF8 COLLATE DEFAULT DEFAULT NULL NOT SERIALIZED , B CHAR(10 BYTES) CHARACTER SET UTF8 COLLATE DEFAULT DEFAULT NULL NOT SERIALIZED ) ATTRIBUTES ALIGNED FORMAT NAMESPACE 'TRAF_RSRVD_3' ; -- GRANT SELECT, INSERT, DELETE, UPDATE, REFERENCES ON TRAFODION.SEABASE.TAO2 TO DB__ROOT WITH GRANT OPTION; --- SQL operation complete. >> >>select cast(column_name as char(10)), column_size from "_MD_".columns where >>object_uid = 8958932759837396480; .. (EXPR) COLUMN_SIZE ---------------------------------------- ----------- A 40 B 10 SYSKEY 8 >>insert into tao2 values(_utf8 x'F0 90 8D 88', _utf8 x'F0 90 8D 88'); --- 1 row(s) inserted. >> ________________________________ From: Eric Owhadi <eric.owh...@esgyn.com> Sent: Wednesday, December 6, 2017 11:40:27 AM To: dev@trafodion.incubator.apache.org Subject: RE: Is varchar(300 bytes) character set utf8 better than varchar(100 chars) character set utf8? Hi Yuan, See my comments inline. I am puzzled with your finding? Eric -----Original Message----- From: Liu, Yuan (Yuan) [mailto:yuan....@esgyn.cn] Sent: Wednesday, December 6, 2017 1:48 AM To: dev@trafodion.incubator.apache.org Subject: Is varchar(300 bytes) character set utf8 better than varchar(100 chars) character set utf8? Hi Trafodioneers, I have a column that need to save 1 to 100 characters. The characters may include both English characters(such as ‘a’, ‘b’..) and Chinese characters(such as ‘你’,‘我’..) So I can define the column type as varchar(100 chars) character set utf8, in that case for each character, no matter it is English or Chinese, it will take 3 bytes space. For example, ‘abc你我’ will consume 15 bytes space. >>> no, the consumption of the string should not depend on the fact you >>> declared max length using Chars or bytes method. >>> your string should always be 9 bytes wide. Did you observe the string being >>> 15 bytes wide? So I consider to change the column type to varchar(100 bytes) character set utf8, in that case ‘abc你我’ will consume 9 bytes space. However, string with more than 34 Chinese characters can not be inserted because 34 Chinese characters will take more than 100 bytes. The worst case is to define the column type as varchar(300 bytes) character set utf8. In general case, I think varchar(300 bytes) character set utf8 is better than varchar(100 chars) character set utf8 if not all characters are Chinese. Is that true? Best regards, Yuan