alamb commented on a change in pull request #1235:
URL: https://github.com/apache/arrow-datafusion/pull/1235#discussion_r742720554
##########
File path: datafusion/src/physical_plan/planner.rs
##########
@@ -1422,6 +1422,10 @@ impl DefaultPhysicalPlanner {
new_plan = optimizer.optimize(new_plan, &ctx_state.config)?;
observer(new_plan.as_ref(), optimizer.as_ref())
}
+ debug!(
+ "Optimized physical plan short version:\n{}\n",
+ displayable(new_plan.as_ref()).indent().to_string()
Review comment:
```suggestion
displayable(new_plan.as_ref()).indent()
```
I don't think the call to `to_string()` is needed here -- not that it is a
huge deal
##########
File path: datafusion/src/physical_plan/expressions/min_max.rs
##########
@@ -38,6 +38,15 @@ use arrow::{
use super::format_state_name;
+// min/max aggregation only returns a value for each group
Review comment:
Maybe we could modify the comment here to explain that since the
`Min/Max` aggregate can handle `Dictionary` encoded inputs, but always produces
"unpacked" (aka non `Dictionary`) output we need to adjust the output data type
to reflect this.
##########
File path: datafusion/tests/sql.rs
##########
@@ -3976,6 +3976,12 @@ async fn query_on_string_dictionary() -> Result<()> {
let expected = vec![vec!["2"]];
assert_eq!(expected, actual);
+ // aggregation min
Review comment:
Since the changes are needed for min and max, I suggest also adding a
test here for `max` to ensure it is covered
##########
File path: datafusion/src/physical_plan/expressions/min_max.rs
##########
@@ -38,6 +38,15 @@ use arrow::{
use super::format_state_name;
+// min/max aggregation only returns a value for each group
+// and should not be a Dictionary data type anymore
+fn min_max_aggregate_data_type(input_type: DataType) -> DataType {
+ match input_type {
+ DataType::Dictionary(_, value_type) => (*value_type).clone(),
+ _ => input_type.clone(),
+ }
Review comment:
I think you can save the clone by reusing the types passed in. Something
like (untested)
```suggestion
if let DataType::Dictionary(_, value_type) = input_type {
value_type
} else {
input_type
}
```
I doubt the extra clone makes any noticeable difference, but I figured I
would point it out
##########
File path: datafusion/src/physical_plan/expressions/min_max.rs
##########
@@ -72,15 +81,15 @@ impl AggregateExpr for Max {
fn field(&self) -> Result<Field> {
Ok(Field::new(
&self.name,
- self.data_type.clone(),
+ min_max_aggregate_data_type(self.data_type.clone()),
Review comment:
I think it might be clearer code if this datatype transformation was done as
part of the construction of `Min` and `Max` -- that way `self.data_type` would
contain the correct output type for the aggregate, and future code would not
have to remember to translate `self.data_type`
Perhaps something like
```
impl Max {
/// Create a new MAX aggregate function
pub fn new(
expr: Arc<dyn PhysicalExpr>,
name: impl Into<String>,
data_type: DataType,
) -> Self {
Self {
name: name.into(),
expr,
data_type: min_max_aggregate_data_type(data_type),
nullable: true,
}
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]