[
https://issues.apache.org/jira/browse/FLINK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896975#comment-16896975
]
Timo Walther commented on FLINK-10257:
--------------------------------------
For you information, I also did some testing of different systems around this
topic:
{code}
Flink:
SELECT '>' || name || '<' FROM (SELECT 'Bob' AS name UNION ALL SELECT 'Alice'
AS name);
>Alice<
>Bob<
SELECT '>' || name || '<' FROM (VALUES ('Bob'), ('Alice')) AS NameTable(name);
>Bob <
>Alice<
SELECT '>' || case 1 when 1 then 'a' when 2 then 'bcd' end || '<' FROM DUAL;
>a <
SQL Server:
SELECT '>' + name + '<' FROM (VALUES ('Bob'), ('Alice')) AS NameTable(name);
>Bob<
>Alice<
PostgreSQL:
SELECT '>' || name || '<' FROM (VALUES ('Bob'), ('Alice')) AS NameTable(name);
>Bob<
>Alice<
SQLLite:
SELECT '>' || column1 || '<' FROM (VALUES ('Bob'), ('Alice'));
>Bob<
>Alice<
Oracle:
SELECT '>' || case 1 when 1 then 'a' when 2 then 'bcd' end || '<' FROM DUAL;
>a<
{code}
This issue popped up a couple of times in the past and is currently different
between Flink and Blink planner. I observed it again while working on
FLINK-13273. I would vote for returning VARCHAR when finding the common type in
Blink, although this would mean that we diverge the two planners here. This
JIRA issue still needs a long-term solution and should be solved in the next
release. I will mark it as a blocker.
[~ykt836] [~jark] [~dawidwys] [~aljoscha]
> Incorrect CHAR type support in Flink SQL and Table API
> ------------------------------------------------------
>
> Key: FLINK-10257
> URL: https://issues.apache.org/jira/browse/FLINK-10257
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / API
> Reporter: Piotr Nowojski
> Assignee: Hequn Cheng
> Priority: Critical
>
> Despite that we officially do not support CHAR type, this type is visible and
> accessible for the users. First of all, string literals have default type of
> CHAR in SQL. Secondly users can always cast expressions/columns to CHAR.
> Problem is that we do not support CHAR correctly. We mishandle it in:
> # comparisons and functions
> # writing values to sinks
> According to SQL standard (and as almost all of the other databases do), CHAR
> comparisons should ignore white spaces. On the other hand, functions like
> {{CONCAT}} or {{LENGTH}} shouldn't:
> [http://troels.arvin.dk/db/rdbms/#data_types-char] .
> Currently in In Flink we completely ignore those rules. Sometimes we store
> internally CHAR with padded spaces sometimes without. This results with semi
> random behaviour with respect to comparisons/functions/writing to sinks. For
> example following query:
> {code:java}
> tEnv.sqlQuery("SELECT CAST(s AS CHAR(10)) FROM
> sourceTable").insertInto("targetTable")
> env.execute()
> {code}
> Where `sourceTable` has single {{VARCHAR(10)}} column with values: "Hi",
> "Hello", "Hello world", writes to sink not padded strings (correctly), but
> following query:
> {code:java}
> tEnv.sqlQuery("SELECT * FROM (SELECT CAST(s AS CHAR(10)) c FROM sourceTable)
> WHERE c = 'Hi'")
> .insertInto("targetTable")
> env.execute(){code}
> Incorrectly filters out all of the results, because {{CAST(s AS CHAR(10))}}
> is a NOOP in Flink, while 'Hi' constant handed by Calcite to us will be
> padded with 8 spaces.
> On the other hand following query produces strings padded with spaces:
> {code:java}
> tEnv.sqlQuery("SELECT CASE l WHEN 1 THEN 'GERMANY' WHEN 2 THEN 'POLAND' ELSE
> 'this should not happen' END FROM sourceTable")
> .insertInto("targetTable")
> env.execute()
> val expected = Seq(
> "GERMANY",
> "POLAND",
> "POLAND").mkString("\n")
> org.junit.ComparisonFailure: Different elements in arrays: expected 3
> elements and received 3
> expected: [GERMANY, POLAND, POLAND]
> received: [GERMANY , POLAND , POLAND ]
> {code}
> To make matter even worse, Calcite's constant folding correctly performs
> comparisons, while if same comparisons are performed by Flink, they yield
> different results. In other words in SQL:
> {code:java}
> SELECT 'POLAND' = 'POLAND '
> {code}
> return true, but same expression performed on columns
> {code:java}
> SELECT CAST(country as CHAR(10)) = CAST(country_padded as CHAR(10)) FROM
> countries{code}
> returns false.
> To further complicated things, in SQL our string literals have {{CHAR}} type,
> while in Table API our literals have String type (effectively {{VARCHAR}})
> making results inconsistent between those two APIs.
>
> CC [~twalthr] [~fhueske] [~hequn8128]
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)