kosiew opened a new issue, #23158: URL: https://github.com/apache/datafusion/issues/23158
## Summary `datafusion/sql/src/unparser/expr.rs` [currently](https://github.com/apache/datafusion/tree/994b926168033e06c4b379b7bac7667c958cfc1b) handles `Operator::IsDistinctFrom` and `Operator::IsNotDistinctFrom` in separate match arms with duplicated dialect-dispatch logic. Factor this logic into a small helper so the dialect-specific spelling, null-safe semantics, and parenthesization rules are encoded in one place. ## Context #22999 adds MySQL-specific unparsing for DataFusion's distinct-from operators: - `a IS NOT DISTINCT FROM b` -> `a <=> b` - `a IS DISTINCT FROM b` -> `NOT (a <=> b)` The implementation adds similar `match self.dialect.distinct_from_style()` blocks in both operator arms. This duplication makes it easier for future changes to update one operator but miss the other, especially around SQL precedence and dialect-specific syntax. ## Problem The invariant is: both `IS DISTINCT FROM` and `IS NOT DISTINCT FROM` must be emitted as a matched pair for each dialect, preserving null-safe equality semantics and unambiguous SQL parsing. Duplicating the dialect mapping in two branches weakens that invariant because: - Parenthesization fixes must be applied in both places. - New dialect styles must update both operators consistently. - The relationship between the two operators is not explicit in code. ## Proposed refactor Add a private helper in `datafusion/sql/src/unparser/expr.rs`, for example: ```rust fn distinct_from_to_sql( &self, left: ast::Expr, right: ast::Expr, negated: bool, ) -> Result<ast::Expr> { match self.dialect.distinct_from_style() { DistinctFromStyle::FullText => { let expr = if negated { ast::Expr::IsNotDistinctFrom(Box::new(left), Box::new(right)) } else { ast::Expr::IsDistinctFrom(Box::new(left), Box::new(right)) }; Ok(ast::Expr::Nested(Box::new(expr))) } DistinctFromStyle::Spaceship => { let spaceship = ast::Expr::Nested(Box::new(ast::Expr::BinaryOp { left: Box::new(left), op: BinaryOperator::Spaceship, right: Box::new(right), })); if negated { Ok(spaceship) } else { Ok(ast::Expr::Nested(Box::new(ast::Expr::UnaryOp { op: UnaryOperator::Not, expr: Box::new(spaceship), }))) } } } } ``` Then both match arms become small and symmetrical: ```rust Operator::IsDistinctFrom => { let left = self.expr_to_sql_inner(left.as_ref())?; let right = self.expr_to_sql_inner(right.as_ref())?; self.distinct_from_to_sql(left, right, false) } Operator::IsNotDistinctFrom => { let left = self.expr_to_sql_inner(left.as_ref())?; let right = self.expr_to_sql_inner(right.as_ref())?; self.distinct_from_to_sql(left, right, true) } ``` Exact naming is flexible. A clearer enum than `negated: bool` would also be reasonable if preferred. ## Expected benefits - Keeps null-safe equality and its negation in one abstraction. - Reduces duplicate control flow in a large unparser function. - Makes precedence-sensitive parenthesization harder to get wrong. - Simplifies future dialect additions or syntax changes. - Keeps the refactor local to `expr.rs` with no public API change. ## Suggested tests Add or keep tests covering: 1. Default/full-text dialect: - `c1 IS DISTINCT FROM true` - `c1 IS NOT DISTINCT FROM true` 2. MySQL dialect: - `IS NOT DISTINCT FROM` unparses to `<=>` - `IS DISTINCT FROM` unparses as `NOT (<=>)` or another unambiguous equivalent, not ambiguous `NOT a <=> b` 3. If parenthesization changes, add a precedence-sensitive MySQL case verifying output remains unambiguous (for example, ``NOT (`c1` <=> true)``, not ``NOT `c1` <=> true``). ## Related PR #22999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
