Hans Zeller created TRAFODION-2477:
--------------------------------------

             Summary: Invalid characters in UCS2 to UTF8 translation are not 
handled correctly
                 Key: TRAFODION-2477
                 URL: https://issues.apache.org/jira/browse/TRAFODION-2477
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-cmp
    Affects Versions: 2.0-incubating
            Reporter: Hans Zeller
            Assignee: Hans Zeller


When translating from UCS-2 to UTF-8, using CAST or TRANSLATE(... UCS2TOUTF8), 
all valid characters will map easily to a UTF-8 character. However, if we 
encounter invalid code points or invalid UTF-16 surrogate pairs, those could 
raise errors. Right now we just suppress those errors. Instead we should either 
translate them to the Unicode "replacement character" U+FFFD or we should raise 
an error. Ideally, we should have a CQD that decides which of these two actions 
to take.

Test case:

create table tbaducs2(a char(10) character set ucs2);

-- DC00 is a low-order UTF-16 surrogate, on its own this is invalid
insert into tbaducs2 values(_ucs2 X'DC000041');

select translate(a using ucs2toutf8) from tbaducs2;
-- this returns an empty string - no error, no replacement character




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to