izveigor commented on PR #6936:
URL: 
https://github.com/apache/arrow-datafusion/pull/6936#issuecomment-1660872766

   @alamb I think we should remain `array.slice()` instead of `take` kernel. 
There are some reasons:
   1) I didn't find the optimal decision for saving the structure for 
`array_slice` and `array_value` (use common parts of the code for both 
expressions) (because `array_value` returns `NULL` statement instead of empty 
array, and if we don't want to inflate code, we should remain `array.slice()` 
statement) (anyway, `array_value` would use the same way like the current 
implementation)
   2) I don't think the current implementation is more slowly than `take` 
analog.
   3) if `take` analogue could theoretically be faster, we could fix it at any 
time.
   
   I understood my mistake about index column support. Now, in my opinion I 
should direct all forces to fix the bug with struct (which is very important 
for IOx). I think it is better development path than comparing `take` and 
`slice`.
   
   The main problem with bug related to struct is difference between 
circumstances for `List` and `Struct` cases. For example, DuckDB can accept 
column support for `List` statement (like: `select make_array(1, 2, 3, 4, 5)[a] 
from index`)), but can not for `Struct`. So, I can solve this problem by using 
union struct (struct with two optional options):
   ```
   #[derive(Clone, PartialEq, Eq, Hash, Debug)]
   pub struct GetIndexedFieldKey {
       ///
       pub list_key: Option<Expr>,
       /// 
       pub struct_key: Option<ScalarValue>,
   }
   
   impl GetIndexedFieldKey {
       /// Create a new GetIndexedFieldKey expression
       pub fn new(list_key: Option<Expr>, struct_key: Option<ScalarValue>) -> 
Self {
           // value must be either `list_key` or `struct_key`
           assert_ne!(list_key.is_some(), struct_key.is_some());
           assert_ne!(list_key.is_none(), struct_key.is_none());
   
           Self {
               list_key,
               struct_key,
           }
       }
   }
   ```
   
   ```
   /// expression to get a field of a struct array.
   #[derive(Debug, Hash)]
   pub struct GetIndexedFieldExpr {
       /// The expression to find
       arg: Arc<dyn PhysicalExpr>,
       /// The key statement
       key: GetIndexedFieldKey,
       /// The extra key (it can be used only with `DataType::List`)
       extra_key: Option<Arc<dyn PhysicalExpr>>,
   }
   ```
   
   What do you think about that approach, @alamb?
   
   P.S. I really want that DataFusion index system matches with DuckDB and 
ClickHouse.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to