alamb commented on code in PR #6936:
URL: https://github.com/apache/arrow-datafusion/pull/6936#discussion_r1278546792
##########
datafusion/proto/src/logical_plan/to_proto.rs:
##########
@@ -951,11 +951,24 @@ impl TryFrom<&Expr> for protobuf::LogicalExprNode {
// see discussion in
https://github.com/apache/arrow-datafusion/issues/2565
return Err(Error::General("Proto serialization error:
Expr::ScalarSubquery(_) | Expr::InSubquery(_) | Expr::Exists { .. } |
Exp:OuterReferenceColumn not supported".to_string()));
}
- Expr::GetIndexedField(GetIndexedField { key, expr }) => Self {
+ Expr::GetIndexedField(GetIndexedField {
Review Comment:
If you can't figure out the protobuf part, I think we have in the past added
a "NotImplemented" error and filed a follow on ticket to add support
Though maybe this comment is outdated as it looks like you have the code here
##########
datafusion/expr/src/field_util.rs:
##########
@@ -18,28 +18,31 @@
//! Utility functions for complex field access
use arrow::datatypes::{DataType, Field};
-use datafusion_common::{DataFusionError, Result, ScalarValue};
+use datafusion_common::{DataFusionError, Result};
/// Returns the field access indexed by `key` from a [`DataType::List`] or
[`DataType::Struct`]
/// # Error
/// Errors if
-/// * the `data_type` is not a Struct or,
+/// * the `data_type` is not a Struct or a List,
Review Comment:
Maybe we could update the documentation to explain what an `extra_key` is
and give an example?
##########
datafusion/physical-expr/src/expressions/get_indexed_field.rs:
##########
@@ -74,70 +90,63 @@ impl PhysicalExpr for GetIndexedFieldExpr {
}
fn data_type(&self, input_schema: &Schema) -> Result<DataType> {
- let data_type = self.arg.data_type(input_schema)?;
- get_data_type_field(&data_type, &self.key).map(|f|
f.data_type().clone())
+ let arg_dt = self.arg.data_type(input_schema)?;
+ let key_dt = self.key.data_type(input_schema)?;
+ let extra_key_dt = if let Some(extra_key) = &self.extra_key {
+ Some(extra_key.data_type(input_schema)?)
+ } else {
+ None
+ };
+ get_data_type_field(&arg_dt, &key_dt, &extra_key_dt)
+ .map(|f| f.data_type().clone())
}
fn nullable(&self, input_schema: &Schema) -> Result<bool> {
- let data_type = self.arg.data_type(input_schema)?;
- get_data_type_field(&data_type, &self.key).map(|f| f.is_nullable())
+ let arg_dt = self.arg.data_type(input_schema)?;
+ let key_dt = self.key.data_type(input_schema)?;
+ let extra_key_dt = if let Some(extra_key) = &self.extra_key {
+ Some(extra_key.data_type(input_schema)?)
+ } else {
+ None
+ };
+ get_data_type_field(&arg_dt, &key_dt, &extra_key_dt).map(|f|
f.is_nullable())
}
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
let array = self.arg.evaluate(batch)?.into_array(1);
- match (array.data_type(), &self.key) {
- (DataType::List(_) | DataType::Struct(_), _) if self.key.is_null()
=> {
- let scalar_null: ScalarValue = array.data_type().try_into()?;
- Ok(ColumnarValue::Scalar(scalar_null))
+ let key = self.key.evaluate(batch)?.into_array(1);
+ if let Some(extra_key) = &self.extra_key {
+ let extra_key = extra_key.evaluate(batch)?.into_array(1);
+ match (array.data_type(), key.data_type(), extra_key.data_type()) {
+ (DataType::List(_), DataType::Int64, DataType::Int64)
+ | (DataType::List(_), DataType::Int64, DataType::Null)
+ | (DataType::List(_), DataType::Null, DataType::Int64)
+ | (DataType::List(_), DataType::Null, DataType::Null) =>
Ok(ColumnarValue::Array(array_slice(&[
Review Comment:
I think this would entail supporting something like `SELECT
array_column[int_column] FROM t` which while cool I think would be much less
common than `SELECT array_column[3] FROM t` (aka a constant) as you have done
here
I think writing the code that takes columnar inputs is likely something we
could work out upstream (aka in arrow-rs) as a `list_take` or similar kernel.
Thus I would suggest we get this PR in (which errors with a non constant)
and file a ticket to support columnar field access as a follow on PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]