[
https://issues.apache.org/jira/browse/HIVE-27112?focusedWorklogId=848332&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-848332
]
ASF GitHub Bot logged work on HIVE-27112:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 01/Mar/23 13:33
Start Date: 01/Mar/23 13:33
Worklog Time Spent: 10m
Work Description: okumin commented on code in PR #4090:
URL: https://github.com/apache/hive/pull/4090#discussion_r1121732045
##########
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayExcept.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * GenericUDFArrayExcept
+ */
+@Description(name = "array_except", value = "_FUNC_(array, value) - Returns an
array of the elements in array1 but not in array2.", extended =
+ "Example:\n" + " > SELECT _FUNC_(array(1, 2, 3,4), array(2,3)) FROM src
LIMIT 1;\n"
+ + " [1,4]") @NDV(maxNdv = 2) public class GenericUDFArrayExcept
extends AbstractGenericUDFArrayBase {
+ static final int ARRAY2_IDX = 1;
+ private static final String FUNC_NAME = "ARRAY_EXCEPT";
+
+ public GenericUDFArrayExcept() {
+ super(FUNC_NAME, 2, 2, ObjectInspector.Category.LIST);
+ }
+
+ @Override public ObjectInspector initialize(ObjectInspector[] arguments)
throws UDFArgumentException {
+ ObjectInspector defaultOI = super.initialize(arguments);
+ checkArgCategory(arguments, ARRAY2_IDX, ObjectInspector.Category.LIST,
FUNC_NAME,
+ org.apache.hadoop.hive.serde.serdeConstants.LIST_TYPE_NAME); //Array1
is already getting validated in Parent class
+ return defaultOI;
+ }
+
+ @Override public Object evaluate(DeferredObject[] arguments) throws
HiveException {
+ Object array = arguments[ARRAY_IDX].get();
+ if (array == null || arrayOI.getListLength(array) <= 0) {
+ return null;
+ }
+
+ List<?> retArray3 = ((ListObjectInspector)
argumentOIs[ARRAY_IDX]).getList(array);
+ retArray3.removeAll(((ListObjectInspector)
argumentOIs[ARRAY2_IDX]).getList(arguments[ARRAY2_IDX].get()));
Review Comment:
- I guess we shouldn't directly mutate the object here because
[ListObjectInspector](https://github.com/apache/hive/blob/rel/release-4.0.0-alpha-2/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardListObjectInspector.java#L116-L117)
can return the original value
- I think it is possible to access the original object like `SELECT
array1, ARRAY_EXCEPT(array1, array2)`
- We want test cases to cover such usage
- We should have to care the case where array2 is null because
`List#removeAll` raises NPE
##########
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayExcept.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * GenericUDFArrayExcept
+ */
+@Description(name = "array_except", value = "_FUNC_(array, value) - Returns an
array of the elements in array1 but not in array2.", extended =
+ "Example:\n" + " > SELECT _FUNC_(array(1, 2, 3,4), array(2,3)) FROM src
LIMIT 1;\n"
+ + " [1,4]") @NDV(maxNdv = 2) public class GenericUDFArrayExcept
extends AbstractGenericUDFArrayBase {
Review Comment:
- How about changing the definition from `_FUNC_(array, value)` to
`_FUNC_(array1, array2)`?
- Line break might be wrong
- We have to remove `@NDV(maxNdv = 2)` because it means the number of
distinct values is up to 2. This UDF can return any arrays, meaning the maximum
number of distinct values is infinity
##########
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayExcept.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * GenericUDFArrayExcept
+ */
+@Description(name = "array_except", value = "_FUNC_(array, value) - Returns an
array of the elements in array1 but not in array2.", extended =
+ "Example:\n" + " > SELECT _FUNC_(array(1, 2, 3,4), array(2,3)) FROM src
LIMIT 1;\n"
+ + " [1,4]") @NDV(maxNdv = 2) public class GenericUDFArrayExcept
extends AbstractGenericUDFArrayBase {
+ static final int ARRAY2_IDX = 1;
+ private static final String FUNC_NAME = "ARRAY_EXCEPT";
+
+ public GenericUDFArrayExcept() {
+ super(FUNC_NAME, 2, 2, ObjectInspector.Category.LIST);
+ }
+
+ @Override public ObjectInspector initialize(ObjectInspector[] arguments)
throws UDFArgumentException {
+ ObjectInspector defaultOI = super.initialize(arguments);
+ checkArgCategory(arguments, ARRAY2_IDX, ObjectInspector.Category.LIST,
FUNC_NAME,
+ org.apache.hadoop.hive.serde.serdeConstants.LIST_TYPE_NAME); //Array1
is already getting validated in Parent class
Review Comment:
Should we check if the type of elements of array1 is equal to array2? It
might be OK if we allow type conversions here.
Issue Time Tracking
-------------------
Worklog Id: (was: 848332)
Time Spent: 20m (was: 10m)
> implement array_except UDF in Hive
> ----------------------------------
>
> Key: HIVE-27112
> URL: https://issues.apache.org/jira/browse/HIVE-27112
> Project: Hive
> Issue Type: Sub-task
> Reporter: Taraka Rama Rao Lethavadla
> Assignee: Taraka Rama Rao Lethavadla
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> *array_except(array1, array2)*
> Returns an array of the elements in {{array1}} but not in {{array2, without
> duplicates.}}
>
> {noformat}
> > SELECT array_except(array(1, 2, 2, 3), array(1, 1, 3, 5));
> [2]
> {noformat}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)