Also, I believe for char(x) character set utf8, the disk storage is 4 bytes per 
UTF8 character (not 3), as stored in the meta-data table COLUMNS, to cover


You can insert a utf8 character of 4 bytes into either version of the column.


--Qifan



>>showddl tao2;


CREATE TABLE TRAFODION.SEABASE.TAO2

  (

    A                                CHAR(10 CHARS) CHARACTER SET UTF8 COLLATE

      DEFAULT DEFAULT NULL NOT SERIALIZED

  , B                                CHAR(10 BYTES) CHARACTER SET UTF8 COLLATE

      DEFAULT DEFAULT NULL NOT SERIALIZED

  )

 ATTRIBUTES ALIGNED FORMAT NAMESPACE 'TRAF_RSRVD_3'

;


-- GRANT SELECT, INSERT, DELETE, UPDATE, REFERENCES ON TRAFODION.SEABASE.TAO2 
TO DB__ROOT WITH GRANT OPTION;


--- SQL operation complete.

>>


>>select cast(column_name as char(10)), column_size from "_MD_".columns where 
>>object_uid = 8958932759837396480;

..


(EXPR)                                    COLUMN_SIZE

----------------------------------------  -----------


A                                                  40

B                                                  10

SYSKEY                                              8


>>insert into tao2 values(_utf8 x'F0 90 8D 88', _utf8 x'F0 90 8D 88');

--- 1 row(s) inserted.
>>
________________________________
From: Eric Owhadi <eric.owh...@esgyn.com>
Sent: Wednesday, December 6, 2017 11:40:27 AM
To: dev@trafodion.incubator.apache.org
Subject: RE: Is varchar(300 bytes) character set utf8 better than varchar(100 
chars) character set utf8?

Hi Yuan,
See my comments inline. I am puzzled with your finding?
Eric

-----Original Message-----
From: Liu, Yuan (Yuan) [mailto:yuan....@esgyn.cn]
Sent: Wednesday, December 6, 2017 1:48 AM
To: dev@trafodion.incubator.apache.org
Subject: Is varchar(300 bytes) character set utf8 better than varchar(100 
chars) character set utf8?

Hi Trafodioneers,

I have a column that need to save 1 to 100 characters. The characters may 
include both English characters(such as ‘a’, ‘b’..) and Chinese characters(such 
as ‘你’,‘我’..)
So I can define the column type as varchar(100 chars) character set utf8, in 
that case for each character, no matter it is English or Chinese, it will take 
3 bytes space. For example, ‘abc你我’ will consume 15 bytes space.


>>> no, the consumption of the string should not depend on the fact you 
>>> declared max length using Chars or bytes method.
>>> your string should always be 9 bytes wide. Did you observe the string being 
>>> 15 bytes wide?

So I consider to change the column type to varchar(100 bytes) character set 
utf8, in that case ‘abc你我’ will consume 9 bytes space. However, string with 
more than 34 Chinese characters can not be inserted because 34 Chinese 
characters will take more than 100 bytes. The worst case is to define the 
column type as varchar(300 bytes) character set utf8.

In general case, I think varchar(300 bytes) character set utf8 is better than 
varchar(100 chars) character set utf8 if not all characters are Chinese. Is 
that true?

Best regards,
Yuan

Reply via email to