snuyanzin commented on PR #24526:
URL: https://github.com/apache/flink/pull/24526#issuecomment-2105169563

   May be it is an unpopular opinion
   however I tend to think that `INTERSECT` vs `INTERSECT ALL` and the same for 
others for set and bag semantics is defined for rows and hardly could be 
applied for collections. I failed to find standard approach for collections 
like arrays (I mean in SQL Stadard). 
   Probably that is one of the reasons we could see a number of vendors are 
handling this differently. From one side we could say that we should follow the 
same approach as for rows. The problem I see here is that by default we will 
remove duplicates however what should we do if we want to keep them? There is 
no well known vendor providing both `array_intersect` and `array_intersect_all` 
or keep/remove duplicates  as a parameter. 
   At the same side if we keep duplicates then we will still be able to cover 
both cases:
   we can do for case with duplicates 
   ```
   array_intersect(array1, array2)
   ```
   and for case without
   ```
   array_distinct(array_intersect(array1, array2))
   ```
   
   Yes, `array_union` looks like an exception here, however if we compare 
against vendors then it is a global exception just because there is a nice 
synonym which is used for another function `array_concat`. So if we want to 
concat arrays without duplicates we use `array_union` and then `array_concat`. 
The problem is that not every function has such a workaround .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to