MartijnVisser commented on PR #24526:
URL: https://github.com/apache/flink/pull/24526#issuecomment-2022804099

   > What is your opinion on how the function should behave?
   
   I've taken a look at how INTERSECT is defined in the SQL standard. Based on 
https://stackoverflow.com/questions/59060599/does-intersect-operator-exist-in-the-sql-standar,
  https://www.postgresql.org/docs/current/queries-union.html, the fact that 
Calcite differentiates between INTERSECT and INTERSECT ALL leads me to believe 
that the default behavior of INTERSECT is to remove duplicates. 
   
   So the result of INTERSECT on `[1, 1, 1, 2] INTERSECT [1, 1, 2]` should be 
`[1, 2]` in my understanding. I think that Spark/Databricks/Presto are 
performing the correct behavior. 
   
   BigQuery and Redshift don't support ARRAY_INTERSECT. ksqlDB follows the same 
behavior as Spark/Databricks/Presto per 
https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/scalar-functions/#array_intersect.
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to