[jira] [Commented] (CALCITE-6001) Add useUtf8AsDefaultCharset flag to SqlConformanceEnum to allow encoding of non-ISO-8859-1 characters

Julian Hyde (Jira) Tue, 17 Oct 2023 13:52:16 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776385#comment-17776385
 ]


Julian Hyde commented on CALCITE-6001:
--------------------------------------

[~tanclary], I saw the PR for 3993 but I think it would still make sense to 
merge this PR (for 6001) first. Do you agree?

If so, can you do some cleanup? There are trailing spaces etc. which would 
break lint. It looks as if the {{useUtf8AsDefaultCharset}} method is not 
needed, as it has been superseded by {{getCharset}}. Also extend parser test a 
little, testing conversion back to SQL in not just BigQuery dialect but also 
Calcite.

> Add useUtf8AsDefaultCharset flag to SqlConformanceEnum to allow encoding of 
> non-ISO-8859-1 characters
> -----------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6001
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6001
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Tanner Clary
>            Assignee: Tanner Clary
>            Priority: Major
>              Labels: pull-request-available
>
> Many dialects supported by Calcite encode their strings using a default 
> charset (most commonly UTF-8 or ISO-8859-1). For example, BigQuery uses 
> [UTF-8|https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type].
>  I am proposing to add a dialect property to be referenced when converting 
> string literals so that the current dialect's default is used unless 
> otherwise specified.
> Presently, if no charset is specified when converting to RexLiterals 
> [here|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rex/RexBuilder.java#L1618],
>  the CalciteSystemProperty {{DEFAULT_CHARSET}} is used 
> ([docs|https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/config/CalciteSystemProperty.java#L300])
>  which is set as ISO-8859-1.
> This means that when converting a query like:
> {{select 'ק' as result;}}
>  you will get the following the error: {{Failed to encode 'ק' in character 
> set 'ISO-8859-1'}}.
> This failure is unexpected if you are using BigQuery conformance(or any 
> dialect whose default is UTF-8).
> Of course an alternative solution would be to just change the Calcite default 
> to UTF-8 which supports encoding any UNICODE character while ISO-8859-1 can 
> only encode the first 256, but I imagine there are reasons against this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-6001) Add useUtf8AsDefaultCharset flag to SqlConformanceEnum to allow encoding of non-ISO-8859-1 characters

Reply via email to