Quanlong Huang created IMPALA-13602:
---------------------------------------
Summary: Add UNICODE() function to return the integer value of the
codepoint
Key: IMPALA-13602
URL: https://issues.apache.org/jira/browse/IMPALA-13602
Project: IMPALA
Issue Type: New Feature
Reporter: Quanlong Huang
It'd be interesting to add a unicode() function to return the integer value of
the codepoint. Similar to what SQL server supports:
https://learn.microsoft.com/en-us/sql/t-sql/functions/unicode-transact-sql?view=sql-server-ver16
An example usage:
{noformat}
[localhost:21050] default> select "\ud840\udc0b";
𠀋
[localhost:21050] default> select unicode("\ud840\udc0b");
131083
[localhost:21050] default> select hex(unicode("\ud840\udc0b"));
2000B
{noformat}
Note that U+2000B is 𠀋: https://symbl.cc/en/2000B/
The constant string is processed in FE which runs in Java codes so uses UTF-16.
U+2000B is encoded into "\ud840\udc0b" in UTF-16BE.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]