[ 
https://issues.apache.org/jira/browse/TRAFODION-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Zeller updated TRAFODION-2515:
-----------------------------------
    Description: 
When we convert text to a character set and encounter an invalid character, we 
should translate it into the "replacement character" of that character set. For 
ASCII and ISO-8859-1, we just use a question mark, since there is not special 
replacement character. When we convert to Unicode, however, we should use 
U+FFFD as the replacement character (often displayed as a black diamond with a 
question mark inside).

Test case:

cqd TRANSLATE_ERROR 'off';
select converttohex(TRANSLATE(_ucs2 X'D8340041' using UCS2toUTF8)) from 
(values(0))x;

The source value is an invalid bit pattern followed by "A" (0041). Right now 
the result shows 3F41 as the output, as Unicode or ASCII text this is "?A". 
With the correct replacement character, the result should be EFBFBD41, with 
EFBFBD being the UTF-8 encoding of U+FFFD.

  was:When we convert text to a character set and encounter an invalid 
character, we should translate it into the "replacement character" of that 
character set. For ASCII and ISO-8859-1, we just use a question mark, since 
there is not special replacement character. When we convert to Unicode, 
however, we should use U+FFFD as the replacement character (often displayed as 
a black diamond with a question mark inside).


> Question mark instead of Unicode replacement character is used
> --------------------------------------------------------------
>
>                 Key: TRAFODION-2515
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2515
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-general
>    Affects Versions: 2.0-incubating
>            Reporter: Hans Zeller
>            Priority: Minor
>
> When we convert text to a character set and encounter an invalid character, 
> we should translate it into the "replacement character" of that character 
> set. For ASCII and ISO-8859-1, we just use a question mark, since there is 
> not special replacement character. When we convert to Unicode, however, we 
> should use U+FFFD as the replacement character (often displayed as a black 
> diamond with a question mark inside).
> Test case:
> cqd TRANSLATE_ERROR 'off';
> select converttohex(TRANSLATE(_ucs2 X'D8340041' using UCS2toUTF8)) from 
> (values(0))x;
> The source value is an invalid bit pattern followed by "A" (0041). Right now 
> the result shows 3F41 as the output, as Unicode or ASCII text this is "?A". 
> With the correct replacement character, the result should be EFBFBD41, with 
> EFBFBD being the UTF-8 encoding of U+FFFD.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to