izveigor commented on PR #6936:
URL:
https://github.com/apache/arrow-datafusion/pull/6936#issuecomment-1660872766
@alamb I think we should remain `array.slice()` instead of `take` kernel.
There are some reasons:
1) I didn't find the optimal decision for saving the structure for
`array_slice` and `array_value` (use common parts of the code for both
expressions) (because `array_value` returns `NULL` statement instead of empty
array, and if we don't want to inflate code, we should remain `array.slice()`
statement) (anyway, `array_value` would use the same way like the current
implementation)
2) I don't think the current implementation is more slowly than `take`
analog.
3) if `take` analogue could theoretically be faster, we could fix it at any
time.
I understood my mistake about index column support. Now, in my opinion I
should direct all forces to fix the bug with struct (which is very important
for IOx). I think it is better development path than comparing `take` and
`slice`.
The main problem with bug related to struct is difference between
circumstances for `List` and `Struct` cases. For example, DuckDB can accept
column support for `List` statement (like: `select make_array(1, 2, 3, 4, 5)[a]
from index`)), but can not for `Struct`. So, I can solve this problem by using
union struct (struct with two optional options):
```
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct GetIndexedFieldKey {
///
pub list_key: Option<Expr>,
///
pub struct_key: Option<ScalarValue>,
}
impl GetIndexedFieldKey {
/// Create a new GetIndexedFieldKey expression
pub fn new(list_key: Option<Expr>, struct_key: Option<ScalarValue>) ->
Self {
// value must be either `list_key` or `struct_key`
assert_ne!(list_key.is_some(), struct_key.is_some());
assert_ne!(list_key.is_none(), struct_key.is_none());
Self {
list_key,
struct_key,
}
}
}
```
```
/// expression to get a field of a struct array.
#[derive(Debug, Hash)]
pub struct GetIndexedFieldExpr {
/// The expression to find
arg: Arc<dyn PhysicalExpr>,
/// The key statement
key: GetIndexedFieldKey,
/// The extra key (it can be used only with `DataType::List`)
extra_key: Option<Arc<dyn PhysicalExpr>>,
}
```
What do you think about that approach, @alamb?
P.S. I really want that DataFusion index system matches with DuckDB and
ClickHouse.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]