[jira] [Commented] (CALCITE-6006) RelToSqlConverter loses charset information

Mihai Budiu (Jira) Thu, 14 Sep 2023 16:42:05 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765390#comment-17765390
 ]


Mihai Budiu commented on CALCITE-6006:
--------------------------------------

This one is harder to fix than I expected.

Just adding the charset to each SqlCharStringLiteral will cause almost all 
literals after RelToSqlConversion to be printed with their charset, which is 
_ISO-8859-1 by default. This makes many tests that rely on the optimizer and 
check the SQL afterwards fail. And the generated code looks ugly too.

My question is: when is it OK not to display the charset of a string literal in 
the emitted SQL?

I have a tentative implementation of SqlCharStringLiteral.unparse looking like 
this:

{code:java}
@Override public void unparse(
      SqlWriter writer,
      int leftPrec,
      int rightPrec) {
    final NlsString nlsString = getValueNonNull();
    boolean prefix = writer.getDialect().supportsCharSet();
    String charsetName = nlsString.getCharsetName();
    if (charsetName == null) {
      prefix = false;
    } else if 
(charsetName.equals(CalciteSystemProperty.DEFAULT_CHARSET.value())) {
      // Do not emit charset if it is the default
      prefix = false;
    }
    writer.literal(nlsString.asSql(prefix, true, writer.getDialect()));
  }
{code}

Is it reasonable to omit the charset from a string literal when it's the 
CalciteSystemProperty.DEFAULT_CHARSET?
Unfortunately dialects do not have a "charset" property. That would be probably 
the logical choice.

> RelToSqlConverter loses charset information 
> --------------------------------------------
>
>                 Key: CALCITE-6006
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6006
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.35.0
>            Reporter: Mihai Budiu
>            Priority: Minor
>
> This is a bug in SqlImplementor, when it calls SqlLiteral.createCharString it 
> does not pass any information about the charset of the source string. So a 
> string that looks like _UTF8'...' is converted to a string without the 
> charset in the generated SQL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-6006) RelToSqlConverter loses charset information

Reply via email to