izveigor commented on issue #6980:
URL: 
https://github.com/apache/arrow-datafusion/issues/6980#issuecomment-1637045457

   > @izveigor, since there are array_intersect and array_except, did you find 
any function similar to array_union, the function that returns a list of all 
elements that exist in `one of the l1 and l2` without duplicates? 
list_intersect([1,2,3], [3,4,5]) -> [1,2,3,4,5]. It seems to me a helpful set 
operation, but not sure why Duckdb does not have this one.
   
   I think we can use Apache Spark SQL and Azure DataBricks function 
`array_union` (See: 
https://sparkbyexamples.com/spark/spark-sql-array-functions/ and 
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/array_union).
 Yes, I agree that the full set of set operations is a very powerful tool.
   
   > It makes sense to allow nested array as the lhs value of three of the 
array_has function. We just need to check the element one by one.
   
   Absolutely agree! I think this is the main mistake with PostgreSQL's array 
function set. Each element, including the array/list itself, must be processed 
by any function.
   
   > The difference of array_has and array_has_all is the rhs value. The former 
is element (non-array), and the latter is array (one-dimension). Should we 
consider them as two different functions as Duckdb does?
   
   I prefer to use ClickHouse/DuckDB version of these functions, because there 
are a lot of possibilities for making full use of the set language 
(specifically we should check whether an element belongs to a set and check 
whether a set is a subset of a certain set).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to