Re: [I] Support multi column IN lists like `(c1, c2) IN ((c1, c2), ,,,)` [arrow-datafusion]

via GitHub Fri, 08 Dec 2023 00:59:14 -0800


my-vegetable-has-exploded commented on issue #6635:
URL: 
https://github.com/apache/arrow-datafusion/issues/6635#issuecomment-1846796006


   Thanks for your patience @alamb. After your explanation, I feel like I 
understand it a little bit more clearly.
   
   ```Tuple([Value('a'), Value('b)], ....)``` can be handled as 
ScalarValue(struct)， ```struct(col1, col2)``` will be 
```BuiltinScalarFunction::Struct``` whose args are col1 and col2 ...
   
   And in_list code is generalized.
   
   
https://github.com/apache/arrow-datafusion/blob/c0c9e8888878c5d7f2586cf605702430c94ea425/datafusion/physical-expr/src/expressions/in_list.rs#L360-L363
   
   But StructArray is not comparable in arrow-rs since it is nested. Should we 
implement compare in datafusion or upstream?
   
   For example,
   
   ```
   ❯ CREATE TABLE colors (
       color_id INT PRIMARY KEY,
       color_name VARCHAR(50)
   );
   INSERT INTO colors (color_id, color_name) VALUES (1, 'Red'), (2, 'Blue');
   
   ❯ SELECT * FROM colors WHERE struct(color_id) IN 
(struct(arrow_cast(1,'Int32')));
   Arrow error: Invalid argument error: Invalid comparison operation: 
Struct([Field { name: "c0", data_type: Int32, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }]) == Struct([Field { name: "c0", 
data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: 
{} }])
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Support multi column IN lists like `(c1, c2) IN ((c1, c2), ,,,)` [arrow-datafusion]

Reply via email to