snuyanzin commented on PR #24526: URL: https://github.com/apache/flink/pull/24526#issuecomment-2105169563
May be it is an unpopular opinion however I tend to think that `INTERSECT` vs `INTERSECT ALL` and the same for others for set and bag semantics is defined for rows and hardly could be applied for collections. I failed to find standard approach for collections like arrays (I mean in SQL Stadard). Probably that is one of the reasons we could see a number of vendors are handling this differently. From one side we could say that we should follow the same approach as for rows. The problem I see here is that by default we will remove duplicates however what should we do if we want to keep them? There is no well known vendor providing both `array_intersect` and `array_intersect_all` or keep/remove duplicates as a parameter. At the same side if we keep duplicates then we will still be able to cover both cases: we can do for case with duplicates ``` array_intersect(array1, array2) ``` and for case without ``` array_distinct(array_intersect(array1, array2)) ``` Yes, `array_union` looks like an exception here, however if we compare against vendors then it is a global exception just because there is a nice synonym which is used for another function `array_concat`. So if we want to concat arrays without duplicates we use `array_union` and then `array_concat`. The problem is that not every function has such a workaround . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org