sort on floating point inputs when enabled [datafusion-comet]

via GitHub Tue, 11 Nov 2025 07:34:21 -0800


martin-g commented on code in PR #2747:
URL: https://github.com/apache/datafusion-comet/pull/2747#discussion_r2514662807



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -47,11 +47,10 @@ Spark normalizes NaN and zero for floating point numbers 
for several cases. See
 However, one exception is comparison. Spark does not normalize NaN and zero 
when comparing values
 because they are handled well in Spark (e.g., 
`SQLOrderingUtil.compareFloats`). But the comparison
 functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g., 
[arrow::compute::kernels::cmp::eq](https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#)).
-So Comet will add additional normalization expression of NaN and zero for 
comparison.
-
-Sorting on floating-point data types (or complex types containing 
floating-point values) is not compatible with
-Spark if the data contains both zero and negative zero. This is likely an edge 
case that is not of concern for many users
-and sorting on floating-point data can be enabled by setting 
`spark.comet.expression.SortOrder.allowIncompatible=true`.
+So Comet will add additional normalization expression of NaN and zero for 
comparison, and may still have differences

Review Comment:
   ```suggestion
   So Comet adds additional normalization expression of NaN and zero for 
comparisons, and may still have differences
   ```



##########
common/src/main/scala/org/apache/comet/CometConf.scala:
##########
@@ -672,6 +672,14 @@ object CometConf extends ShimCometConf {
       .booleanConf
       .createWithDefault(false)
 
+  val COMET_EXEC_STRICT_FLOATING_POINT: ConfigEntry[Boolean] =
+    conf("spark.comet.exec.strictFloatingPoint")
+      .category(CATEGORY_EXEC)
+      .doc("When enabled, fall back to Spark for floating-point operations 
that differ from " +

Review Comment:
   ```suggestion
         .doc("When enabled, fall back to Spark for floating-point operations 
that may differ from " +
   ```



##########
spark/src/main/scala/org/apache/comet/serde/CometAggregate.scala:
##########
@@ -133,8 +133,15 @@ trait CometBaseAggregate {
 
       val aggExprs =
         aggregateExpressions.map(aggExprToProto(_, output, binding, 
aggregate.conf))
-      if (childOp.nonEmpty && groupingExprs.forall(_.isDefined) &&
-        aggExprs.forall(_.isDefined)) {
+      if (aggExprs.exists(_.isEmpty)) {
+        withInfo(
+          aggregate,
+          "Unsupported aggregate expression",

Review Comment:
   ```suggestion
             "Unsupported aggregate expression(s)",
   ```



##########
spark/src/main/scala/org/apache/comet/serde/aggregates.scala:
##########
@@ -78,6 +89,16 @@ object CometMax extends CometAggregateExpressionSerde[Max] {
       withInfo(aggExpr, s"Unsupported data type: ${expr.dataType}")
       return None
     }
+
+    if (expr.dataType == DataTypes.FloatType || expr.dataType == 
DataTypes.DoubleType) {
+      if (CometConf.COMET_EXEC_STRICT_FLOATING_POINT.get()) {

Review Comment:
   Same code as at line 47.
   Could be extracted to a helper method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: Add strict floating point mode and fallback to Spark for min/max/sort on floating point inputs when enabled [datafusion-comet]

Reply via email to