izveigor commented on issue #6980: URL: https://github.com/apache/arrow-datafusion/issues/6980#issuecomment-1637045457
> @izveigor, since there are array_intersect and array_except, did you find any function similar to array_union, the function that returns a list of all elements that exist in `one of the l1 and l2` without duplicates? list_intersect([1,2,3], [3,4,5]) -> [1,2,3,4,5]. It seems to me a helpful set operation, but not sure why Duckdb does not have this one. I think we can use Apache Spark SQL and Azure DataBricks function `array_union` (See: https://sparkbyexamples.com/spark/spark-sql-array-functions/ and https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/array_union). Yes, I agree that the full set of set operations is a very powerful tool. > It makes sense to allow nested array as the lhs value of three of the array_has function. We just need to check the element one by one. Absolutely agree! I think this is the main mistake with PostgreSQL's array function set. Each element, including the array/list itself, must be processed by any function. > The difference of array_has and array_has_all is the rhs value. The former is element (non-array), and the latter is array (one-dimension). Should we consider them as two different functions as Duckdb does? I prefer to use ClickHouse/DuckDB version of these functions, because there are a lot of possibilities for making full use of the set language (specifically we should check whether an element belongs to a set and check whether a set is a subset of a certain set). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
