[
https://issues.apache.org/jira/browse/CALCITE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benchao Li closed CALCITE-6051.
-------------------------------
> Incorrect translation for unicode strings in SqlDialect's
> quoteStringLiteralUnicode method for HiveSqlDialect and SparkSqlDialect
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-6051
> URL: https://issues.apache.org/jira/browse/CALCITE-6051
> Project: Calcite
> Issue Type: Bug
> Reporter: Shivangi
> Priority: Major
> Attachments: image-2023-10-16-18-54-53-483.png
>
>
> Hi,
> The unicodes returned by calcite have broken formats. For example, the string
> `Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is
> coming from
> calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
> file, `quoteStringLiteralUnicode` method:
> {code:java}
> /**
> * Converts a string into a unicode string literal. For example,
> * <code>can't{tab}run\</code> becomes <code>u'can''t\0009run\\'</code>.
> */
> public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
> buf.append("u&'");
> for (int i = 0; i < val.length(); i++) {
> char c = val.charAt(i);
> if (c < 32 || c >= 128) {
> buf.append('\\');
> buf.append(HEXITS[(c >> 12) & 0xf]);
> buf.append(HEXITS[(c >> 8) & 0xf]);
> buf.append(HEXITS[(c >> 4) & 0xf]);
> buf.append(HEXITS[c & 0xf]);
> } else if (c == '\'' || c == '\\') {
> buf.append(c);
> buf.append(c);
> } else {
> buf.append(c);
> }
> }
> buf.append("'");
> }
> {code}
> The queries fail when we pass a query containing this encoding.
> Also tested the same query you've shared on hive and spark:
> Hive:
> {code:java}
> select u&'hello world';
> Error: Error while compiling statement: FAILED: SemanticException [Error
> 10004]: Line 1:7 Invalid table alias or column reference 'u': (possible
> column names are: ) (state=42000,code=10004)
> {code}
> Spark:
> {code:java}
> select u&'hello world';
> User class threw exception: org.apache.spark.sql.AnalysisException: cannot
> resolve 'u' given input columns: []; line 1 pos 7;
> {code}
> This is HiveSqlDialect:
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/HiveSqlDialect.java
> There is no overriding function in HiveSql dialect corresponding to
> `quoteStringLiteralUnicode` method in SqlDialect.
> Corresponding SparkSqlDialect:
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql/dialect/SparkSqlDialect.java
>
> *Ask:*
> Why is `buf.append("u&'")` added in this method? I couldn't find relatable
> unicode conversion that contains `u&`, as a result, it breaks when read by
> the client. I wanted to understand the reason why `u&` is being used and what
> can break if we remove `&`.
> Thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)