Quanlong Huang created IMPALA-13602:
---------------------------------------

             Summary: Add UNICODE() function to return the integer value of the 
codepoint
                 Key: IMPALA-13602
                 URL: https://issues.apache.org/jira/browse/IMPALA-13602
             Project: IMPALA
          Issue Type: New Feature
            Reporter: Quanlong Huang


It'd be interesting to add a unicode() function to return the integer value of 
the codepoint. Similar to what SQL server supports:
https://learn.microsoft.com/en-us/sql/t-sql/functions/unicode-transact-sql?view=sql-server-ver16

An example usage:
{noformat}
[localhost:21050] default> select "\ud840\udc0b";
𠀋
[localhost:21050] default> select unicode("\ud840\udc0b");
131083
[localhost:21050] default> select hex(unicode("\ud840\udc0b"));
2000B
{noformat}
Note that U+2000B is 𠀋: https://symbl.cc/en/2000B/
The constant string is processed in FE which runs in Java codes so uses UTF-16. 
U+2000B is encoded into "\ud840\udc0b" in UTF-16BE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to