Hi,

UCS2 and UTF8 are both encodings of the Unicode character set.

UCS2 is an older encoding. Each character is encoded as 16 bits. So, it does 
not contain all of the Unicode character set. I've heard that it includes about 
3000 of the most commonly used Chinese characters. For historical reasons, the 
Trafodion metadata tables use UCS2 for columns containing object names.

On the other hand, UTF8 is a newer encoding, and can encode the entire Unicode 
character set. There are other encodings, such as UTF-16 and UTF-32, which 
Trafodion does not currently support.

So, there are characters in Unicode that can be represented in UTF8 that cannot 
in UCS2.

In terms of performance, it's a mixed bag. It depends on the data you are 
storing and what you are doing with it. For example, Chinese characters 
typically are 3 bytes in UTF8, but 2 bytes in UCS2. But some of the less 
frequently used Chinese characters appear only in UTF8.

If you have a mix of ASCII data and Chinese characters stored in a column, the 
most efficient character set will depend on the ratio of ASCII to Chinese 
characters.

If you are doing a lot of string operations such as SUBSTRING or POSITION, UCS2 
is more efficient since it is a fixed width encoding. (I can go directly to the 
10th character, for example, but in UTF8 one has to start at the beginning of 
the string and count characters.)

I hope this helps,

Dave



-----Original Message-----
From: Liu, Yuan (Yuan) [mailto:yuan....@esgyn.cn] 
Sent: Sunday, November 12, 2017 6:18 PM
To: dev@trafodion.incubator.apache.org
Subject: About UCS2

Hi Trafodioneers,

We Trafodion have three main charsets, thery are ISO88591, UTF8 and UCS2.

As I know, ISO88591 is the default charset when we define char/varchar, it is a 
single-byte character set.
UTF8 is mainly used if we want to store UTF8 such as Chinese characters.
But what about UCS2? I have never used UCS2 before, what is the suitable case 
for UCS2?

Best regards,
Yuan

Reply via email to