[ 
https://issues.apache.org/jira/browse/FLINK-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545114#comment-16545114
 ] 

ASF GitHub Bot commented on FLINK-5750:
---------------------------------------

GitHub user AlexanderKoltsov opened a pull request:

    https://github.com/apache/flink/pull/6341

    [FLINK-5750] Incorrect translation of n-ary Union

    ## What is the purpose of the change
    
    *This pull request adds supporting multiple inputs in DataSetUnionRule and 
DataStreamUnionRule*
    
    
    ## Brief change log
    
      - *DataSetUnionRule and DataStreamUnionRule should consider all inputs 
instead of only the 1st and 2nd*
    
    
    ## Verifying this change
    
    *This change added the following test:*
    - *Added unit test testValuesWithCast that validates VALUES operator with 
values which have to to be casted. This query will be transform to UNION of 
VALUES in plan optimizer since values arguments are not literal value*
    - *Also added plan test for testValuesWithCast*
    
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / 
**no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AlexanderKoltsov/flink bug/flink-5750

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6341.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6341
    
----
commit 04a7efc99033d5740d2140aa2978cb4a3b7ae38b
Author: Alexander Koltsov <alexander_koltsov@...>
Date:   2018-07-10T13:45:12Z

    [FLINK-5750] Incorrect translation of n-ary Union
    
    Calcite's union operator is supports more than two input relations.
    However, Flink's translation rules only consider the first two relations
    because we assumed that Calcite's union is binary.
    This problem exists for batch and streaming queries.

----


> Incorrect translation of n-ary Union
> ------------------------------------
>
>                 Key: FLINK-5750
>                 URL: https://issues.apache.org/jira/browse/FLINK-5750
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API &amp; SQL
>    Affects Versions: 1.2.0, 1.3.4, 1.5.0, 1.4.2, 1.6.0
>            Reporter: Anton Mushin
>            Assignee: Alexander Koltsov
>            Priority: Critical
>              Labels: pull-request-available
>
> Calcite's union operator is supports more than two input relations. However, 
> Flink's translation rules only consider the first two relations because we 
> assumed that Calcite's union is binary. 
> This problem exists for batch and streaming queries.
> It seems that Calcite only generates non-binary Unions in rare cases 
> ({{(SELECT * FROM t) UNION ALL (SELECT * FROM t) UNION ALL (SELECT * FROM 
> t)}} results in two binary union operators) but the problem definitely needs 
> to be fixed.
> The following query can be used to validate the problem. 
> {code:java}
> @Test
>       public void testValuesWithCast() throws Exception {
>               ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
>               BatchTableEnvironment tableEnv = 
> TableEnvironment.getTableEnvironment(env, config());
>               String sqlQuery = "VALUES (1, cast(1 as BIGINT) )," +
>                       "(2, cast(2 as BIGINT))," +
>                       "(3, cast(3 as BIGINT))";
>               String sqlQuery2 = "VALUES (1,1)," +
>                       "(2, 2)," +
>                       "(3, 3)";
>               Table result = tableEnv.sql(sqlQuery);
>               DataSet<Row> resultSet = tableEnv.toDataSet(result, Row.class);
>               List<Row> results = resultSet.collect();
>               Table result2 = tableEnv.sql(sqlQuery2);
>               DataSet<Row> resultSet2 = tableEnv.toDataSet(result2, 
> Row.class);
>               List<Row> results2 = resultSet2.collect();
>               String expected = "1,1\n2,2\n3,3";
>               compareResultAsText(results2, expected);
>               compareResultAsText(results, expected);
>       }
> {code}
> AR for {{results}} variable
> {noformat}
> java.lang.AssertionError: Different elements in arrays: expected 3 elements 
> and received 2
>  expected: [1,1, 2,2, 3,3]
>  received: [1,1, 2,2] 
> Expected :3
> Actual   :2
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to