berkaysynnada commented on code in PR #14523: URL: https://github.com/apache/datafusion/pull/14523#discussion_r1966851842
########## datafusion/common/src/scalar/mod.rs: ########## @@ -1583,6 +1583,17 @@ impl ScalarValue { } } + /// Returns negation for a boolean scalar value + pub fn boolean_negate(&self) -> Result<Self> { Review Comment: Maybe we can extend arithmetic_negate() with booleans, wdyt? ########## datafusion/expr-common/src/interval_arithmetic.rs: ########## @@ -424,6 +416,14 @@ impl Interval { ) } + pub fn is_null(&self) -> bool { Review Comment: "null interval" term might mislead, maybe we can rename it as is_unbounded or is_infinite ? ########## datafusion/physical-expr/src/expressions/in_list.rs: ########## @@ -398,6 +399,51 @@ impl PhysicalExpr for InListExpr { self.static_filter.clone(), ))) } + + /// The output interval is computed by checking if the list item intervals are + /// a subset of, overlap, or are disjoint with the input expression's interval. + /// + /// If [InListExpr::negated] is true, the output interval gets negated. + /// + /// # Example: + /// If the input expression's interval is a superset of the + /// conjunction of the list items intervals, the output + /// interval is [`Interval::CERTAINLY_TRUE`]. + /// + /// ```text + /// interval of expr: ....---------------------.... + /// Some list items: ..........|..|.....|.|....... + /// + /// output interval: [`true`, `true`] + /// ``` + fn evaluate_bounds(&self, children: &[&Interval]) -> Result<Interval> { + let expr_bounds = children[0]; + + debug_assert!( + children.len() >= 2, + "InListExpr requires at least one list item" + ); + + // conjunction of list item intervals + let list_bounds = children + .iter() + .skip(2) + .try_fold(children[1].clone(), |acc, item| acc.union(*item))?; + + if self.negated { + expr_bounds.contains(list_bounds)?.boolean_negate() + } else { + expr_bounds.contains(list_bounds) Review Comment: Let's say expr_bounds is [3,5], and list bounds are [[0,2], [6,9]]. As the union logic cannot yet support interval sets, the union result would be [0, 9], and this contains check will give certainly_true; however, the actual result is certainly_false. I think that's not what you prefer Even if we can support interval sets, let's consider this example expr_bounds is [0,10] & list_bounds: [[2,6], [3,5], [6,9]]. Do we want to return certainly_true? expr could be 5, and the list might include [3,4,7]. So, in_list() is false, even if we say its bounds are certainly_true. Am I missing something? ########## datafusion/expr/src/udf.rs: ########## @@ -717,9 +722,18 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { /// /// If the function is `ABS(a)`, and the input interval is `a: [-3, 2]`, /// then the output interval would be `[0, 3]`. - fn evaluate_bounds(&self, _input: &[&Interval]) -> Result<Interval> { - // We cannot assume the input datatype is the same of output type. - Interval::make_unbounded(&DataType::Null) + fn evaluate_bounds(&self, input: &[&Interval]) -> Result<Interval> { Review Comment: I really like getting rid of Null bounds and looking up the result type, but I have a question: You declared supports_bounds_evaluation() as false. So, do the "evaluate" invokers even call this function after seeing supports_bounds_evaluation() as false? I think supports_bounds_evaluation() shouldn't say false ########## datafusion/expr-common/src/operator.rs: ########## @@ -164,6 +164,25 @@ impl Operator { ) } + /// Indicates whether this operator supports interval arithmetic + pub fn supports_bounds_evaluation(&self) -> bool { Review Comment: supports_interval_evaluation or supports_range_evaluation, wdyt? ########## datafusion/expr-common/src/interval_arithmetic.rs: ########## @@ -880,6 +925,19 @@ impl Interval { upper: self.lower().clone().arithmetic_negate()?, }) } + + pub fn boolean_negate(self) -> Result<Self> { + if self.data_type() != DataType::Boolean { Review Comment: likewise here, would it be too inclusive to use "arithmetic" keyword for booleans? ########## datafusion/expr-common/src/interval_arithmetic.rs: ########## @@ -902,6 +960,15 @@ pub fn apply_operator(op: &Operator, lhs: &Interval, rhs: &Interval) -> Result<I Operator::Minus => lhs.sub(rhs), Operator::Multiply => lhs.mul(rhs), Operator::Divide => lhs.div(rhs), + Operator::IsDistinctFrom | Operator::IsNotDistinctFrom => { + NullableInterval::from(lhs) + .apply_operator(op, &rhs.into()) + .and_then(|x| { Review Comment: We can avoid cloning here actually introducing a new taker API to NullableInterval ########## datafusion/expr-common/src/interval_arithmetic.rs: ########## @@ -306,23 +306,15 @@ impl Interval { // Standardize floating-point endpoints: DataType::Float32 => handle_float_intervals!(Float32, f32, lower, upper), DataType::Float64 => handle_float_intervals!(Float64, f64, lower, upper), - // Unsigned null values for lower bounds are set to zero: - DataType::UInt8 if lower.is_null() => Self { - lower: ScalarValue::UInt8(Some(0)), - upper, - }, - DataType::UInt16 if lower.is_null() => Self { - lower: ScalarValue::UInt16(Some(0)), - upper, - }, - DataType::UInt32 if lower.is_null() => Self { - lower: ScalarValue::UInt32(Some(0)), - upper, - }, - DataType::UInt64 if lower.is_null() => Self { - lower: ScalarValue::UInt64(Some(0)), - upper, - }, + // Lower bounds of unsigned integer null values are set to zero: Review Comment: 👍🏻 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org