Jefffrey commented on code in PR #18593: URL: https://github.com/apache/datafusion/pull/18593#discussion_r2510483261
########## datafusion/functions-aggregate-common/src/noop_accumulator.rs: ########## @@ -0,0 +1,70 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use arrow::array::ArrayRef; +use datafusion_common::{Result, ScalarValue}; +use datafusion_expr_common::accumulator::Accumulator; + +/// [`Accumulator`] that does no work and always returns a fixed value (default +/// of `NULL` but can be customized). +/// +/// Useful for aggregate functions that need to handle an input of [`DataType::Null`] +/// that does no work. +/// +/// [`DataType::Null`]: arrow::datatypes::DataType::Null +#[derive(Debug)] +pub struct NoopAccumulator { + evaluate_value: ScalarValue, +} Review Comment: I looked into other ideas like trying to implement `simplify` on `AggregateUDFImpl` to return `Null` directly instead of having this noop accumulator but couldn't get it to work; figured can at least centralize this boilerplate and look into easier/more ergonomic handling of pure null inputs for UDFs later ########## datafusion/functions-aggregate/src/bit_and_or_xor.rs: ########## @@ -249,23 +253,22 @@ impl AggregateUDFImpl for BitwiseOperation { } fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { - let arg_type = &arg_types[0]; - if !arg_type.is_integer() { - return exec_err!( - "[return_type] {} not supported for {}", - self.name(), - arg_type - ); - } - Ok(arg_type.clone()) + Ok(arg_types[0].clone()) } fn accumulator(&self, acc_args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> { downcast_bitwise_accumulator!(acc_args, self.operation, acc_args.is_distinct) } fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<FieldRef>> { - if self.operation == BitwiseOperationType::Xor && args.is_distinct { + if args.input_fields[0].data_type().is_null() { Review Comment: Needing to change state_fields is not desirable, but until we can make accumulators own state_fields (instead of the function) this will have to do - See: https://github.com/apache/datafusion/issues/14701#issuecomment-3419742933 ########## datafusion/functions-aggregate/src/bit_and_or_xor.rs: ########## @@ -228,7 +229,10 @@ impl BitwiseOperation { ) -> Self { Self { operation: operator, - signature: Signature::uniform(1, INTEGERS.to_vec(), Volatility::Immutable), + signature: Signature::coercible( + vec![Coercion::new_exact(TypeSignatureClass::Integer)], + Volatility::Immutable, + ), Review Comment: Main change here; a consequence is `DataType::Null` input comes in as `DataType::Null` instead of previously being casted to `DataType::Int8`. ########## datafusion/functions-aggregate/src/approx_distinct.rs: ########## @@ -269,29 +267,6 @@ where default_accumulator_impl!(); } -impl Accumulator for NullHLLAccumulator { Review Comment: Technically unrelated to the main change, but since I introduce `NoopAccumulator` I figured it would be good to use it in other places too, so refactored this. ########## datafusion/functions-aggregate/src/bit_and_or_xor.rs: ########## @@ -89,6 +89,7 @@ macro_rules! accumulator_helper { macro_rules! downcast_bitwise_accumulator { ($args:ident, $opr:expr, $is_distinct: expr) => { match $args.return_field.data_type() { + DataType::Null => Ok(Box::new(NoopAccumulator::default())), Review Comment: Handling of Null input type; introduce a new `NoopAccumulator` which can be used for other aggregate functions, like approx_distinct ########## datafusion/sqllogictest/test_files/aggregate.slt: ########## @@ -7134,11 +7140,16 @@ statement ok drop table employee_csv; # test null literal handling in supported aggregate functions -query I??III?T +query I??????T select count(null), min(null), max(null), bit_and(NULL), bit_or(NULL), bit_xor(NULL), nth_value(NULL, 1), string_agg(NULL, ','); ---- 0 NULL NULL NULL NULL NULL NULL NULL +query TTT +SELECT arrow_typeof(bit_and(NULL)), arrow_typeof(bit_or(NULL)), arrow_typeof(bit_xor(NULL)) +---- +Null Null Null Review Comment: Note how null inputs now have null output types too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
