[ 
https://issues.apache.org/jira/browse/FLINK-31663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825176#comment-17825176
 ] 

Sergey Nuyanzin edited comment on FLINK-31663 at 3/11/24 7:25 AM:
------------------------------------------------------------------

{quote}
Since the behavior of array_union is already aligned with Spark's(without 
duplicates ), if we don't align here, the logic of the entire function would 
seem inconsistent. should we change the behavior if array_union. but if we 
change it ,it will cause version compatibility problem
{quote}
I don't think we should copy everything that is present in Spark.

there is Snowflake, ClickHouse, PostgreSQL, DuckDB and etc.

{{ARRAY_EXCEPT}} keeps duplicates (as in Snowflake) and it allows to cover some 
cases not covered by the version eliminating duplicates. In case there is a 
need to eliminate duplicates there is {{ARRAY_DISTINCT}}.
And Flink follows this way
Yep there is {{ARRAY_UNION}} which eliminates duplicates
However there is also {{ARRAY_CONCAT}} which concatenates arrays without 
duplicates elimination, moreover it can concatenate more than 2 arrays at once 
(like in BigQuery, ClickHouse, DuckDB)



was (Author: sergey nuyanzin):
{quote}
Since the behavior of array_union is already aligned with Spark's(without 
duplicates ), if we don't align here, the logic of the entire function would 
seem inconsistent. should we change the behavior if array_union. but if we 
change it ,it will cause version compatibility problem
{quote}
I don't think we should copy everything that present in Spark.

there is Snowflake, ClickHouse, PostgreSQL, DuckDB and etc.

{{ARRAY_EXCEPT}} keeps duplicates (as in Snowflake) and it allows to cover some 
cases not covered by the version eliminating duplicates. In cae there is a need 
to eliminate duplicates there is {{ARRAY_DISTINCT}}.
And Flink follows this way
Yep there is {{ARRAY_UNION}} which eliminates duplicates
However there is also {{ARRAY_CONCAT}} which concatenates arrays without 
duplicates elimination, moreover it can concatenate more than 2 arrays at once 
(like in BigQuery, ClickHouse, DuckDB)


> Add ARRAY_EXCEPT supported in SQL & Table API
> ---------------------------------------------
>
>                 Key: FLINK-31663
>                 URL: https://issues.apache.org/jira/browse/FLINK-31663
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / API
>            Reporter: luoyuxia
>            Assignee: Hanyu Zheng
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.20.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to