martin-g commented on code in PR #2747: URL: https://github.com/apache/datafusion-comet/pull/2747#discussion_r2514662807
########## docs/source/user-guide/latest/compatibility.md: ########## @@ -47,11 +47,10 @@ Spark normalizes NaN and zero for floating point numbers for several cases. See However, one exception is comparison. Spark does not normalize NaN and zero when comparing values because they are handled well in Spark (e.g., `SQLOrderingUtil.compareFloats`). But the comparison functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g., [arrow::compute::kernels::cmp::eq](https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#)). -So Comet will add additional normalization expression of NaN and zero for comparison. - -Sorting on floating-point data types (or complex types containing floating-point values) is not compatible with -Spark if the data contains both zero and negative zero. This is likely an edge case that is not of concern for many users -and sorting on floating-point data can be enabled by setting `spark.comet.expression.SortOrder.allowIncompatible=true`. +So Comet will add additional normalization expression of NaN and zero for comparison, and may still have differences Review Comment: ```suggestion So Comet adds additional normalization expression of NaN and zero for comparisons, and may still have differences ``` ########## common/src/main/scala/org/apache/comet/CometConf.scala: ########## @@ -672,6 +672,14 @@ object CometConf extends ShimCometConf { .booleanConf .createWithDefault(false) + val COMET_EXEC_STRICT_FLOATING_POINT: ConfigEntry[Boolean] = + conf("spark.comet.exec.strictFloatingPoint") + .category(CATEGORY_EXEC) + .doc("When enabled, fall back to Spark for floating-point operations that differ from " + Review Comment: ```suggestion .doc("When enabled, fall back to Spark for floating-point operations that may differ from " + ``` ########## spark/src/main/scala/org/apache/comet/serde/CometAggregate.scala: ########## @@ -133,8 +133,15 @@ trait CometBaseAggregate { val aggExprs = aggregateExpressions.map(aggExprToProto(_, output, binding, aggregate.conf)) - if (childOp.nonEmpty && groupingExprs.forall(_.isDefined) && - aggExprs.forall(_.isDefined)) { + if (aggExprs.exists(_.isEmpty)) { + withInfo( + aggregate, + "Unsupported aggregate expression", Review Comment: ```suggestion "Unsupported aggregate expression(s)", ``` ########## spark/src/main/scala/org/apache/comet/serde/aggregates.scala: ########## @@ -78,6 +89,16 @@ object CometMax extends CometAggregateExpressionSerde[Max] { withInfo(aggExpr, s"Unsupported data type: ${expr.dataType}") return None } + + if (expr.dataType == DataTypes.FloatType || expr.dataType == DataTypes.DoubleType) { + if (CometConf.COMET_EXEC_STRICT_FLOATING_POINT.get()) { Review Comment: Same code as at line 47. Could be extracted to a helper method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
