This is an automated email from the ASF dual-hosted git repository.
gengliangwang pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new 57adc83df58b [SPARK-57198][SQL] Skip the divide-by-zero check in
codegen when the divisor is a non-zero literal
57adc83df58b is described below
commit 57adc83df58be4e6427defb631c1108ce921bfae
Author: Gengliang Wang <[email protected]>
AuthorDate: Wed Jun 3 08:16:13 2026 -0700
[SPARK-57198][SQL] Skip the divide-by-zero check in codegen when the
divisor is a non-zero literal
### What changes were proposed in this pull request?
`DivModLike` (`Divide` / `Remainder` / `IntegralDivide`) and `Pmod` always
emit a divide-by-zero guard in their generated code:
```java
if (divisor == 0) {
throw QueryExecutionErrors.divideByZeroError(...); // ANSI; or: isNull =
true;
} else {
... result ...
}
```
When the divisor is a foldable, non-null, non-zero constant, that guard is
dead code. This PR detects that case (`divisorIsNonZero`) and emits only the
division/remainder body, dropping the check. The `errCtx` error-context value
is also made `lazy` so its constant-pool reference is no longer registered when
the (now-skipped) check is the only thing that would have used it.
Example: for `col / 100.0` in ANSI mode, the generated code goes from
```java
if (100.0D == 0) {
throw QueryExecutionErrors.divideByZeroError(((SQLQueryContext)
references[1] /* errCtx */));
} else {
project_value = (double)(col / 100.0D);
}
```
to
```java
project_value = (double)(col / 100.0D);
```
The dead-check elimination only fires when the divisor is statically
non-zero; a zero literal divisor (which always errors / returns null) and any
variable divisor keep the existing check.
### Why are the changes needed?
This is a sub-task of SPARK-56908 (reduce generated Java size in
whole-stage codegen). Dumping the whole-stage codegen of the TPC-DS queries
shows 56 occurrences of the dead `if (100.0D == 0) throw
QueryExecutionErrors.divideByZeroError(...)` check across 14 queries
(percentage computations such as `x / 100.0`). Each is unreachable source plus
an unreachable `references[]` / constant-pool entry. Removing them shrinks the
generated code and eases constant-pool pressure with no behavior [...]
### Does this PR introduce _any_ user-facing change?
No. The generated code for a non-zero constant divisor is smaller but
computes the same result; behavior for zero and variable divisors is unchanged.
### How was this patch tested?
- Added `ArithmeticExpressionSuite` test "SPARK-57198: skip the
divide-by-zero check when the divisor is a non-zero literal", asserting the
by-zero check is dropped for a non-zero literal divisor (Divide / Remainder /
IntegralDivide / Pmod) and kept for a variable divisor. Existing tests (e.g.
SPARK-33008) continue to cover the zero-literal-divisor error path.
- Verified end-to-end by re-dumping the TPC-DS whole-stage codegen: the 56
dead `if (100.0D == 0)` checks dropped to 0, and all generated subtrees still
compile.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)
Closes #56253 from gengliangwang/spark-divide-skip-zerocheck.
Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 142df0b9313abd32022bdb5b9c41b38e536934d9)
Signed-off-by: Gengliang Wang <[email protected]>
---
.../sql/catalyst/expressions/arithmetic.scala | 89 ++++++++++++++++------
.../expressions/ArithmeticExpressionSuite.scala | 35 +++++++++
2 files changed, 99 insertions(+), 25 deletions(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index cb7ba16aeb81..88b409c4e867 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -653,6 +653,15 @@ trait DivModLike extends BinaryArithmetic {
case _ => x => x == 0
}
+ // Whether the divisor is a non-null and non-zero literal (foldable divisors
are already folded
+ // to a literal before codegen). When true, the divide-by-zero check is dead
code (it can never
+ // trigger), so codegen skips emitting both the check and its
otherwise-unreachable error-context
+ // reference.
+ private lazy val divisorIsNonZero: Boolean = right match {
+ case Literal(divisor, _) => divisor != null && !isZero(divisor)
+ case _ => false
+ }
+
final override def eval(input: InternalRow): Any = {
// evaluate right first as we have a chance to skip left if right is 0
val input2 = right.eval(input)
@@ -705,7 +714,9 @@ trait DivModLike extends BinaryArithmetic {
s"${eval2.value} == 0"
}
val javaType = CodeGenerator.javaType(dataType)
- val errorContextCode = getContextOrNullCode(ctx, failOnError)
+ // Lazy so the error-context reference is only registered when actually
emitted; a statically
+ // non-zero divisor (see divisorIsNonZero) skips the divide-by-zero check
that would use it.
+ lazy val errorContextCode = getContextOrNullCode(ctx, failOnError)
val operation = super.dataType match {
case DecimalType.Fixed(precision, scale) =>
val castUtils = classOf[CastUtils].getName
@@ -741,25 +752,34 @@ trait DivModLike extends BinaryArithmetic {
// evaluate right first as we have a chance to skip left if right is 0
if (!left.nullable && !right.nullable) {
- val divByZero = if (failOnError) {
- s"throw ${divideByZeroErrorCode(ctx)};"
+ val divisionBody =
+ s"""
+ |${eval1.code}
+ |$checkIntegralDivideOverflow
+ |$operation""".stripMargin
+ // A statically non-zero divisor makes the zero check dead code, so emit
only the division.
+ val guardedBody = if (divisorIsNonZero) {
+ divisionBody
} else {
- s"${ev.isNull} = true;"
+ val divByZero = if (failOnError) {
+ s"throw ${divideByZeroErrorCode(ctx)};"
+ } else {
+ s"${ev.isNull} = true;"
+ }
+ s"""
+ |if ($isZero) {
+ | $divByZero
+ |} else {$divisionBody
+ |}""".stripMargin
}
ev.copy(code = code"""
${eval2.code}
boolean ${ev.isNull} = false;
$javaType ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
- if ($isZero) {
- $divByZero
- } else {
- ${eval1.code}
- $checkIntegralDivideOverflow
- $operation
- }""")
+ $guardedBody""")
} else {
- val nullOnErrorCondition = if (failOnError) "" else s" || $isZero"
- val failOnErrorBranch = if (failOnError) {
+ val nullOnErrorCondition = if (failOnError || divisorIsNonZero) "" else
s" || $isZero"
+ val failOnErrorBranch = if (failOnError && !divisorIsNonZero) {
s"if ($isZero) throw ${divideByZeroErrorCode(ctx)};"
} else {
""
@@ -1079,6 +1099,14 @@ case class Pmod(
case _ => x => x == 0
}
+ // Whether the divisor is a non-null and non-zero literal (foldable divisors
are already folded
+ // to a literal before codegen). When true, the remainder-by-zero check is
dead code, so codegen
+ // skips emitting it.
+ private lazy val divisorIsNonZero: Boolean = right match {
+ case Literal(divisor, _) => divisor != null && !isZero(divisor)
+ case _ => false
+ }
+
private lazy val pmodFunc: (Any, Any) => Any = dataType match {
case _: IntegerType => (l, r) => MathUtils.pmod(l.asInstanceOf[Int],
r.asInstanceOf[Int])
case _: LongType => (l, r) => MathUtils.pmod(l.asInstanceOf[Long],
r.asInstanceOf[Long])
@@ -1119,7 +1147,9 @@ case class Pmod(
}
val remainder = ctx.freshName("remainder")
val javaType = CodeGenerator.javaType(dataType)
- val errorContext = getContextOrNullCode(ctx)
+ // Lazy so the error-context reference is only registered when actually
emitted; a statically
+ // non-zero divisor (see divisorIsNonZero) skips the remainder-by-zero
check that would use it.
+ lazy val errorContext = getContextOrNullCode(ctx)
val mathUtils = MathUtils.getClass.getCanonicalName.stripSuffix("$")
val result = dataType match {
case DecimalType.Fixed(precision, scale) =>
@@ -1146,24 +1176,33 @@ case class Pmod(
// evaluate right first as we have a chance to skip left if right is 0
if (!left.nullable && !right.nullable) {
- val divByZero = if (failOnError) {
- s"throw QueryExecutionErrors.remainderByZeroError($errorContext);"
+ val remainderBody =
+ s"""
+ |${eval1.code}
+ |$result""".stripMargin
+ // A statically non-zero divisor makes the zero check dead code, so emit
only the remainder.
+ val guardedBody = if (divisorIsNonZero) {
+ remainderBody
} else {
- s"${ev.isNull} = true;"
+ val divByZero = if (failOnError) {
+ s"throw QueryExecutionErrors.remainderByZeroError($errorContext);"
+ } else {
+ s"${ev.isNull} = true;"
+ }
+ s"""
+ |if ($isZero) {
+ | $divByZero
+ |} else {$remainderBody
+ |}""".stripMargin
}
ev.copy(code = code"""
${eval2.code}
boolean ${ev.isNull} = false;
$javaType ${ev.value} = ${CodeGenerator.defaultValue(dataType)};
- if ($isZero) {
- $divByZero
- } else {
- ${eval1.code}
- $result
- }""")
+ $guardedBody""")
} else {
- val nullOnErrorCondition = if (failOnError) "" else s" || $isZero"
- val failOnErrorBranch = if (failOnError) {
+ val nullOnErrorCondition = if (failOnError || divisorIsNonZero) "" else
s" || $isZero"
+ val failOnErrorBranch = if (failOnError && !divisorIsNonZero) {
s"if ($isZero) throw
QueryExecutionErrors.remainderByZeroError($errorContext);"
} else {
""
diff --git
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
index 7d659fca5df2..f8cdf825bc4c 100644
---
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
+++
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
@@ -912,6 +912,41 @@ class ArithmeticExpressionSuite extends SparkFunSuite with
ExpressionEvalHelper
}
}
+ test("SPARK-57198: skip the divide-by-zero check when the divisor is a
non-zero literal") {
+ def codeOf(e: Expression): String = e.genCode(new
CodegenContext).code.toString
+
+ Seq(true, false).foreach { ansi =>
+ withSQLConf(SQLConf.ANSI_ENABLED.key -> ansi.toString) {
+ // A non-zero literal divisor makes the zero check dead code; codegen
must not emit it.
+ Seq(
+ Divide(Literal(1.0), Literal(2.0)),
+ Remainder(Literal(7), Literal(3)),
+ IntegralDivide(Literal(7L), Literal(3L)),
+ Pmod(Literal(7), Literal(3))).foreach { e =>
+ val code = codeOf(e)
+ assert(!code.contains("ByZeroError"),
+ s"expected no by-zero check for a non-zero literal divisor in
$e:\n$code")
+ }
+
+ // A variable (non-foldable) divisor keeps the check: in ANSI mode the
by-zero error must
+ // still be generated.
+ Seq(
+ Divide(
+ BoundReference(0, DoubleType, nullable = false),
+ BoundReference(1, DoubleType, nullable = false)),
+ Pmod(
+ BoundReference(0, IntegerType, nullable = false),
+ BoundReference(1, IntegerType, nullable = false))).foreach { e =>
+ val code = codeOf(e)
+ if (ansi) {
+ assert(code.contains("ByZeroError"),
+ s"expected a by-zero check for a variable divisor in $e:\n$code")
+ }
+ }
+ }
+ }
+ }
+
test("SPARK-34677: exact add and subtract of day-time and year-month
intervals") {
Seq(EvalMode.ANSI, EvalMode.LEGACY).foreach { evalMode =>
checkExceptionInExpression[ArithmeticException](
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]