snuyanzin commented on PR #24526:
URL: https://github.com/apache/flink/pull/24526#issuecomment-2123898530
>@snuyanzin @MartijnVisser Why do we necessarily have to align our semantics
with Snowflake?
as it was mentioned above the main reason is that multi-set semantics
(Snowflake) allows to handle cases with duplicates e.g.
```sql
SELECT array_count(array_intersect(array('A', 'B', 'B'), array('B', 'B',
'B'))); -- returns 2
```
and without
```sql
SELECT array_count(array_distinct(array_intersect(array('A', 'B', 'B'),
array('B', 'B', 'B')))); -- returns 1
```
And it seems Spark and others couldn't calculate amount of duplicates with
their semantics and I would consider it as a main drawback of their approach,
please correct me if I'm wrong @liuyongvs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]