Shivangi created CALCITE-6051:
---------------------------------
Summary: Incorrect format for unicode strings
Key: CALCITE-6051
URL: https://issues.apache.org/jira/browse/CALCITE-6051
Project: Calcite
Issue Type: Bug
Reporter: Shivangi
Hi,
The unicodes returned by calcite have broken formats. For example, the string
`Conveniência` is converted into `u&'Conveni\00eancia'`. Here `u&` is coming
from
calcite-core-1.2.0-incubating-sources.jar!/org/apache/calcite/sql/SqlDialect.java
file, `quoteStringLiteralUnicode` method:
{code:java}
/**
* Converts a string into a unicode string literal. For example,
* <code>can't{tab}run\</code> becomes <code>u'can''t\0009run\\'</code>.
*/
public void quoteStringLiteralUnicode(StringBuilder buf, String val) {
buf.append("u&'");
for (int i = 0; i < val.length(); i++) {
char c = val.charAt(i);
if (c < 32 || c >= 128) {
buf.append('\\');
buf.append(HEXITS[(c >> 12) & 0xf]);
buf.append(HEXITS[(c >> 8) & 0xf]);
buf.append(HEXITS[(c >> 4) & 0xf]);
buf.append(HEXITS[c & 0xf]);
} else if (c == '\'' || c == '\\') {
buf.append(c);
buf.append(c);
} else {
buf.append(c);
}
}
buf.append("'");
}
{code}
Why is `buf.append("u&'")` added in this method? I couldn't find relatable
unicode conversion that contains `u&`, as a result, it breaks when read by the
client. I wanted to understand the reason why `u&` is being used and what can
break if we remove `&`.
Thanks!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)