Re: [PR] [FLINK-31664] Implement ARRAY_INTERSECT function [flink]

via GitHub Tue, 21 May 2024 22:32:07 -0700


snuyanzin commented on PR #24526:
URL: https://github.com/apache/flink/pull/24526#issuecomment-2123898530


   >@snuyanzin @MartijnVisser Why do we necessarily have to align our semantics 
with Snowflake?
   
   as it was mentioned above the main reason is that multi-set semantics 
(Snowflake) allows to handle cases with duplicates e.g. 
   ```sql
   SELECT array_count(array_intersect(array('A', 'B', 'B'), array('B', 'B', 
'B'))); -- returns 2
   ```
   and without
   ```sql
   SELECT array_count(array_distinct(array_intersect(array('A', 'B', 'B'), 
array('B', 'B', 'B')))); -- returns 1
   ```
   
   And it seems Spark and others couldn't calculate amount of duplicates with 
their semantics and I would consider it as a main drawback of their approach, 
please correct me if I'm wrong @liuyongvs 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-31664] Implement ARRAY_INTERSECT function [flink]

Reply via email to