kosiew commented on code in PR #20595:
URL: https://github.com/apache/datafusion/pull/20595#discussion_r2923071875


##########
datafusion/spark/src/function/math/isnan.rs:
##########
@@ -0,0 +1,245 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::any::Any;
+use std::sync::Arc;
+
+use arrow::array::{Array, ArrayRef, BooleanArray, Float32Array, Float64Array};
+use arrow::datatypes::DataType;
+use datafusion_common::utils::take_function_args;
+use datafusion_common::{Result, ScalarValue, exec_err};
+use datafusion_expr::{
+    ColumnarValue, ScalarFunctionArgs, ScalarUDFImpl, Signature, TypeSignature,
+    Volatility,
+};
+
+/// Spark-compatible `isnan` expression
+/// <https://spark.apache.org/docs/latest/api/sql/index.html#isnan>
+///
+/// Differences with DataFusion isnan:
+///  - Spark returns `false` for NULL inputs; DataFusion propagates NULL
+///  - Spark only accepts Float32 and Float64; DataFusion accepts all numeric
+///    types (returning false for integers and decimals, which are never NaN)
+#[derive(Debug, PartialEq, Eq, Hash)]
+pub struct SparkIsNaN {
+    signature: Signature,
+}
+
+impl Default for SparkIsNaN {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl SparkIsNaN {
+    pub fn new() -> Self {
+        Self {
+            signature: Signature::one_of(
+                vec![
+                    TypeSignature::Exact(vec![DataType::Float32]),
+                    TypeSignature::Exact(vec![DataType::Float64]),

Review Comment:
   Does this mean `SELECT isnan(NULL)` will be rejected during planning because 
the literal has `Null` type and there is no coercion path?
    
   That is a semantic gap from Spark, where `isnan(NULL)` returns `false`.



##########
datafusion/spark/src/function/math/isnan.rs:
##########
@@ -0,0 +1,245 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::any::Any;
+use std::sync::Arc;
+
+use arrow::array::{Array, ArrayRef, BooleanArray, Float32Array, Float64Array};
+use arrow::datatypes::DataType;
+use datafusion_common::utils::take_function_args;
+use datafusion_common::{Result, ScalarValue, exec_err};
+use datafusion_expr::{
+    ColumnarValue, ScalarFunctionArgs, ScalarUDFImpl, Signature, TypeSignature,
+    Volatility,
+};
+
+/// Spark-compatible `isnan` expression
+/// <https://spark.apache.org/docs/latest/api/sql/index.html#isnan>
+///
+/// Differences with DataFusion isnan:
+///  - Spark returns `false` for NULL inputs; DataFusion propagates NULL
+///  - Spark only accepts Float32 and Float64; DataFusion accepts all numeric
+///    types (returning false for integers and decimals, which are never NaN)
+#[derive(Debug, PartialEq, Eq, Hash)]
+pub struct SparkIsNaN {
+    signature: Signature,
+}
+
+impl Default for SparkIsNaN {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl SparkIsNaN {
+    pub fn new() -> Self {
+        Self {
+            signature: Signature::one_of(
+                vec![
+                    TypeSignature::Exact(vec![DataType::Float32]),
+                    TypeSignature::Exact(vec![DataType::Float64]),
+                ],
+                Volatility::Immutable,
+            ),
+        }
+    }
+}
+
+impl ScalarUDFImpl for SparkIsNaN {
+    fn as_any(&self) -> &dyn Any {
+        self
+    }
+
+    fn name(&self) -> &str {
+        "isnan"
+    }
+
+    fn signature(&self) -> &Signature {
+        &self.signature
+    }
+
+    fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> {
+        Ok(DataType::Boolean)
+    }
+
+    fn invoke_with_args(&self, args: ScalarFunctionArgs) -> 
Result<ColumnarValue> {
+        spark_isnan(&args.args)
+    }
+}
+
+fn spark_isnan(args: &[ColumnarValue]) -> Result<ColumnarValue> {
+    let [value] = take_function_args("isnan", args)?;
+
+    match value {

Review Comment:
   The scalar and array paths each repeat the same "float32 vs float64, then 
map `is_nan`, then patch nulls" structure.
   
   A small helper here would make the implementation easier to scan and keep 
future changes aligned across both types.



##########
datafusion/sqllogictest/test_files/spark/math/isnan.slt:
##########
@@ -0,0 +1,72 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+
+#   http://www.apache.org/licenses/LICENSE-2.0
+
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# This file is part of the implementation of the datafusion-spark function 
library.
+# For more information, please see:
+#   https://github.com/apache/datafusion/issues/15914
+
+# Tests for Spark-compatible isnan function.
+# Spark semantics differ from DataFusion's built-in isnan:
+#   Spark returns false for NULL inputs; DataFusion returns NULL.
+#
+# Example: SELECT isnan(NULL::DOUBLE)
+#   Spark:      returns false
+#   DataFusion: returns NULL
+
+# Scalar input: float64
+query BBBBB
+SELECT isnan(1.0::DOUBLE), isnan('NaN'::DOUBLE), isnan('inf'::DOUBLE), 
isnan(0.0::DOUBLE), isnan(-1.0::DOUBLE);

Review Comment:
   The happy-path coverage is solid.
   
   Can you also add one Spark-specific planner/error test such as `SELECT 
isnan(1)` to document that this Spark variant intentionally rejects 
non-floating numerics.
   
   That would make the behavioral difference from the built-in DataFusion 
`isnan` obvious to future readers and protect against accidental widening of 
the signature later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to