hanyuzheng7 commented on code in PR #23173: URL: https://github.com/apache/flink/pull/23173#discussion_r1492749138
########## flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/functions/scalar/ArrayExceptFunction.java: ########## @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.runtime.functions.scalar; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.table.api.DataTypes; +import org.apache.flink.table.api.Expressions; +import org.apache.flink.table.data.ArrayData; +import org.apache.flink.table.data.GenericArrayData; +import org.apache.flink.table.functions.BuiltInFunctionDefinitions; +import org.apache.flink.table.functions.FunctionContext; +import org.apache.flink.table.functions.SpecializedFunction; +import org.apache.flink.table.types.CollectionDataType; +import org.apache.flink.table.types.DataType; +import org.apache.flink.util.FlinkRuntimeException; + +import javax.annotation.Nullable; + +import java.lang.invoke.MethodHandle; +import java.util.ArrayList; +import java.util.HashSet; +import java.util.List; +import java.util.Set; + +import static org.apache.flink.table.api.Expressions.$; + +/** Implementation of {@link BuiltInFunctionDefinitions#ARRAY_EXCEPT}. */ +@Internal +public class ArrayExceptFunction extends BuiltInScalarFunction { + private final ArrayData.ElementGetter elementGetter; + private final SpecializedFunction.ExpressionEvaluator hashcodeEvaluator; + private final SpecializedFunction.ExpressionEvaluator equalityEvaluator; + private transient MethodHandle hashcodeHandle; + + private transient MethodHandle equalityHandle; + + public ArrayExceptFunction(SpecializedFunction.SpecializedContext context) { + super(BuiltInFunctionDefinitions.ARRAY_EXCEPT, context); + final DataType dataType = + ((CollectionDataType) context.getCallContext().getArgumentDataTypes().get(0)) + .getElementDataType() + .toInternal(); + elementGetter = ArrayData.createElementGetter(dataType.toInternal().getLogicalType()); + hashcodeEvaluator = + context.createEvaluator( + Expressions.call("$HASHCODE$1", $("element1")), + DataTypes.INT(), + DataTypes.FIELD("element1", dataType.notNull().toInternal())); + equalityEvaluator = + context.createEvaluator( + $("element1").isEqual($("element2")), + DataTypes.BOOLEAN(), + DataTypes.FIELD("element1", dataType.notNull().toInternal()), + DataTypes.FIELD("element2", dataType.notNull().toInternal())); + } + + @Override + public void open(FunctionContext context) throws Exception { + hashcodeHandle = hashcodeEvaluator.open(context); + equalityHandle = equalityEvaluator.open(context); + } + + public @Nullable ArrayData eval(ArrayData arrayOne, ArrayData arrayTwo) { + try { + if (arrayOne == null) { + return null; + } + + List<Object> list = new ArrayList<>(); + Set<ObjectContainer> seen = new HashSet<>(); + + boolean isNullPresentInArrayTwo = false; + if (arrayTwo != null) { + for (int pos = 0; pos < arrayTwo.size(); pos++) { + final Object element = elementGetter.getElementOrNull(arrayTwo, pos); + if (element == null) { + isNullPresentInArrayTwo = true; + } else { + ObjectContainer objectContainer = new ObjectContainer(element); + seen.add(objectContainer); + } + } + } + boolean isNullPresentInArrayOne = false; + for (int pos = 0; pos < arrayOne.size(); pos++) { + final Object element = elementGetter.getElementOrNull(arrayOne, pos); + if (element == null) { + isNullPresentInArrayOne = true; + } else { + ObjectContainer objectContainer = new ObjectContainer(element); + if (!seen.contains(objectContainer)) { + seen.add(objectContainer); + list.add(element); + } + } + } + if (!isNullPresentInArrayTwo && isNullPresentInArrayOne) { + list.add(null); + } Review Comment: I've conducted some research on how different databases handle the array_except function. Here's what I found: Snowflake: The function returns an array containing elements from the source array that are not in the array of elements to exclude. If no elements remain after exclusion, an empty array is returned. If either argument is NULL, the function returns NULL. The order of values in the returned array is unspecified. Documentation: https://docs.snowflake.com/en/sql-reference/functions/array_except Databricks: The documentation does not specify how NULL values are handled or the order of the returned array. Documentation: https://docs.databricks.com/en/sql/language-manual/functions/array_except.html Spark: Similar to Databricks, there is no explanation regarding NULL values or the order of the returned array. Documentation: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.array_except.html PrestoDB: Again, the handling of NULL values and the order of the returned array are not described. Documentation: https://prestodb.io/docs/current/search.html?q=array_except# Doris: The order of the returned array is specified to be the same as array1. If any input array is NULL, the function returns NULL. Documentation: https://doris.apache.org/docs/1.2/sql-manual/sql-functions/array-functions/array_except/ Based on these findings, it seems prudent to align with either Snowflake or Doris, both of which return NULL if any input array is NULL. For the order of the values within the returned array, following Snowflake's approach of not specifying the order might be a reasonable choice. However, considering the specific behavior of Doris where the order is the same as array1, it might be beneficial to adopt this approach for clarity and predictability in the results. Which approach do you think is more reasonable for determining the order of elements in the output array? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org