[
https://issues.apache.org/jira/browse/TRAFODION-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891225#comment-15891225
]
ASF GitHub Bot commented on TRAFODION-2477:
-------------------------------------------
GitHub user zellerh opened a pull request:
https://github.com/apache/incubator-trafodion/pull/986
[TRAFODION-2477] Invalid characters in translation are ignored
Right now we ignore such invalid characters and also may truncate
the string at the point of the invalid char. The expected behavior
would be an error.
The only type of invalid data I could create with regular SQL syntax
is an invalid UTF-16 surrogate pair. We have no checks that detect
those when we enter the data. Invalid UTF-8, on the other hand, is
rejected when we try to insert it in the database (at least in the
case I tried).
The fix adds a check to generate an error (file conversionLocale.cpp).
It also adds two CQDs to suppress the error (remaining code files)
and to replace the invalid character with a replacement character.
Right now we use "?", even for Unicode, which has a special replacement
character, see TRAFODION-2515.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zellerh/incubator-trafodion bug/cses_jan-17
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafodion/pull/986.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #986
----
commit 079b2107bc6c4475192d21deba9cac3f1f6687dd
Author: Hans Zeller <[email protected]>
Date: 2017-03-01T22:33:44Z
[TRAFODION-2477] Invalid characters in translation are ignored
Right now we ignore such invalid characters and also may truncate
the string at the point of the invalid char. The expected behavior
would be an error.
The only type of invalid data I could create with regular SQL syntax
is an invalid UTF-16 surrogate pair. We have no checks that detect
those when we enter the data. Invalid UTF-8, on the other hand, is
rejected when we try to insert it in the database (at least in the
case I tried).
The fix adds a check to generate an error (file conversionLocale.cpp).
It also adds two CQDs to suppress the error (remaining code files)
and to replace the invalid character with a replacement character.
Right now we use "?", even for Unicode, which has a special replacement
character, see TRAFODION-2515.
----
> Invalid characters in UCS2 to UTF8 translation are not handled correctly
> ------------------------------------------------------------------------
>
> Key: TRAFODION-2477
> URL: https://issues.apache.org/jira/browse/TRAFODION-2477
> Project: Apache Trafodion
> Issue Type: Bug
> Components: sql-cmp
> Affects Versions: 2.0-incubating
> Reporter: Hans Zeller
> Assignee: Hans Zeller
> Fix For: 2.2-incubating
>
>
> When translating from UCS-2 to UTF-8, using CAST or TRANSLATE(...
> UCS2TOUTF8), all valid characters will map easily to a UTF-8 character.
> However, if we encounter invalid code points or invalid UTF-16 surrogate
> pairs, those could raise errors. Right now we just suppress those errors.
> Instead we should either translate them to the Unicode "replacement
> character" U+FFFD or we should raise an error. Ideally, we should have a CQD
> that decides which of these two actions to take.
> Test case:
> create table tbaducs2(a char(10) character set ucs2);
> -- DC00 is a low-order UTF-16 surrogate, on its own this is invalid
> insert into tbaducs2 values(_ucs2 X'DC000041');
> select translate(a using ucs2toutf8) from tbaducs2;
> -- this returns an empty string - no error, no replacement character
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)