[
https://issues.apache.org/jira/browse/FLINK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597434#comment-16597434
]
Piotr Nowojski commented on FLINK-10257:
----------------------------------------
I can think of about 3 solutions to this problem:
# Drop {{CHAR}} support, and convert all string literals to {{VARCHAR}} in
SQL. Any attempt to cast a column to {{CHAR}} should return an exception.
# Provide internal {{CHAR}} support in SQL, while keeping it unsupported in
Table API
# Provide internal {{CHAR}} support both for SQL and Table API, potentially
changing type of string literals in Table API to {{CHAR}} as well to make it
consistent with SQL
For option 1., we could either:
* convince Calcite to do this
* or we would have to rewrite all {{CHAR}} types on our side in all RexNodes
and RelNodes (similarly, but a bit more complicated to
{{RelTimeIndicatorConverter}}
For option 2., we would need to properly support {{CHAR}} in all string
functions and comparisons, with respect to padding. Probably to make things
more consistent, we should make a contract that either we internally store
{{CHAR}} always padded or never padded (now it's semi random). For writing to
connectors we would need to cast all {{CHAR}} columns to {{VARCHAR}} which
would require trimming.
For option 3. we would additionally need to add support for {{CHAR}} in Table
API.
> Incorrect CHAR type support in Flink SQL and Table API
> ------------------------------------------------------
>
> Key: FLINK-10257
> URL: https://issues.apache.org/jira/browse/FLINK-10257
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Reporter: Piotr Nowojski
> Priority: Critical
>
> Despite that we officially do not support CHAR type, this type is visible and
> accessible for the users. First of all, string literals have default type of
> CHAR in SQL. Secondly users can always cast expressions/columns to CHAR.
> Problem is that we do not support CHAR correctly. We mishandle it in:
> # comparisons and functions
> # writing values to sinks
> According to SQL standard (and as almost all of the other databases do), CHAR
> comparisons should ignore white spaces. On the other hand, functions like
> {{CONCAT}} or {{LENGTH}} shouldn't:
> [http://troels.arvin.dk/db/rdbms/#data_types-char] .
> Currently in In Flink we completely ignore those rules. Sometimes we store
> internally CHAR with padded spaces sometimes without. This results with semi
> random behaviour with respect to comparisons/functions/writing to sinks. For
> example following query:
>
> {code:java}
> tEnv.sqlQuery("SELECT CAST(s AS CHAR(10)) FROM
> sourceTable").insertInto("targetTable")
> env.execute()
> {code}
> Where `sourceTable` has single {{VARCHAR(10)}} column with values: "Hi",
> "Hello", "Hello world", writes to sink not padded strings (correctly), but
> following query incorrectly
>
> {code:java}
> tEnv.sqlQuery("SELECT * FROM (SELECT CAST(s AS CHAR(10)) c FROM sourceTable)
> WHERE c = 'Hi'")
> .insertInto("targetTable")
> env.execute(){code}
> Incorrectly filters out all of the results, because {{CAST(s AS CHAR(10))}}
> is a NOOP in Flink, while 'Hi' constant handled by Calcite to us will be
> padded with 8 spaces.
> On the other hand following query produces strings padded with spaces:
>
> {code:java}
> tEnv.sqlQuery("SELECT CASE l WHEN 1 THEN 'GERMANY' WHEN 2 THEN 'POLAND' ELSE
> 'this should not happen' END FROM sourceTable")
> .insertInto("targetTable")
> env.execute()
> val expected = Seq(
> "GERMANY",
> "POLAND",
> "POLAND").mkString("\n")
> org.junit.ComparisonFailure: Different elements in arrays: expected 3
> elements and received 3
> expected: [GERMANY, POLAND, POLAND]
> received: [GERMANY , POLAND , POLAND ]
> {code}
>
> To make matter even worse, Calcite's constant folding correctly performs
> comparisons, while if same comparisons are performed by Flink, they yield
> different results. In other words in SQL:
> {code:java}
> SELECT 'POLAND' = 'POLAND '
> {code}
> return true, but same expression performed on columns
> {code:java}
> SELECT CAST(country as CHAR(10)) = CAST(country_padded as CHAR(10)) FROM
> countries{code}
> returns false.
> To further complicated things, in SQL our string literals have {{CHAR}} type,
> while in Table API our literals have String type (effectively {{VARCHAR}})
> making results inconsistent between those two APIs.
>
> CC [~twalthr] [~fhueske] [~hequn8128]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)