[ 
https://issues.apache.org/jira/browse/HIVE-27112?focusedWorklogId=848332&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-848332
 ]

ASF GitHub Bot logged work on HIVE-27112:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Mar/23 13:33
            Start Date: 01/Mar/23 13:33
    Worklog Time Spent: 10m 
      Work Description: okumin commented on code in PR #4090:
URL: https://github.com/apache/hive/pull/4090#discussion_r1121732045


##########
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayExcept.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * GenericUDFArrayExcept
+ */
+@Description(name = "array_except", value = "_FUNC_(array, value) - Returns an 
array of the elements in array1 but not in array2.", extended =
+    "Example:\n" + "  > SELECT _FUNC_(array(1, 2, 3,4), array(2,3)) FROM src 
LIMIT 1;\n"
+        + "  [1,4]") @NDV(maxNdv = 2) public class GenericUDFArrayExcept 
extends AbstractGenericUDFArrayBase {
+  static final int ARRAY2_IDX = 1;
+  private static final String FUNC_NAME = "ARRAY_EXCEPT";
+
+  public GenericUDFArrayExcept() {
+    super(FUNC_NAME, 2, 2, ObjectInspector.Category.LIST);
+  }
+
+  @Override public ObjectInspector initialize(ObjectInspector[] arguments) 
throws UDFArgumentException {
+    ObjectInspector defaultOI = super.initialize(arguments);
+    checkArgCategory(arguments, ARRAY2_IDX, ObjectInspector.Category.LIST, 
FUNC_NAME,
+        org.apache.hadoop.hive.serde.serdeConstants.LIST_TYPE_NAME); //Array1 
is already getting validated in Parent class
+    return defaultOI;
+  }
+
+  @Override public Object evaluate(DeferredObject[] arguments) throws 
HiveException {
+    Object array = arguments[ARRAY_IDX].get();
+    if (array == null || arrayOI.getListLength(array) <= 0) {
+      return null;
+    }
+
+    List<?> retArray3 = ((ListObjectInspector) 
argumentOIs[ARRAY_IDX]).getList(array);
+    retArray3.removeAll(((ListObjectInspector) 
argumentOIs[ARRAY2_IDX]).getList(arguments[ARRAY2_IDX].get()));

Review Comment:
   - I guess we shouldn't directly mutate the object here because 
[ListObjectInspector](https://github.com/apache/hive/blob/rel/release-4.0.0-alpha-2/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardListObjectInspector.java#L116-L117)
 can return the original value
       - I think it is possible to access the original object like `SELECT 
array1, ARRAY_EXCEPT(array1, array2)`
       - We want test cases to cover such usage
   - We should have to care the case where array2 is null because 
`List#removeAll` raises NPE



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayExcept.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * GenericUDFArrayExcept
+ */
+@Description(name = "array_except", value = "_FUNC_(array, value) - Returns an 
array of the elements in array1 but not in array2.", extended =
+    "Example:\n" + "  > SELECT _FUNC_(array(1, 2, 3,4), array(2,3)) FROM src 
LIMIT 1;\n"
+        + "  [1,4]") @NDV(maxNdv = 2) public class GenericUDFArrayExcept 
extends AbstractGenericUDFArrayBase {

Review Comment:
   - How about changing the definition from `_FUNC_(array, value)` to 
`_FUNC_(array1, array2)`?
   - Line break might be wrong
   - We have to remove `@NDV(maxNdv = 2)` because it means the number of 
distinct values is up to 2. This UDF can return any arrays, meaning the maximum 
number of distinct values is infinity



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayExcept.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.udf.generic;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * GenericUDFArrayExcept
+ */
+@Description(name = "array_except", value = "_FUNC_(array, value) - Returns an 
array of the elements in array1 but not in array2.", extended =
+    "Example:\n" + "  > SELECT _FUNC_(array(1, 2, 3,4), array(2,3)) FROM src 
LIMIT 1;\n"
+        + "  [1,4]") @NDV(maxNdv = 2) public class GenericUDFArrayExcept 
extends AbstractGenericUDFArrayBase {
+  static final int ARRAY2_IDX = 1;
+  private static final String FUNC_NAME = "ARRAY_EXCEPT";
+
+  public GenericUDFArrayExcept() {
+    super(FUNC_NAME, 2, 2, ObjectInspector.Category.LIST);
+  }
+
+  @Override public ObjectInspector initialize(ObjectInspector[] arguments) 
throws UDFArgumentException {
+    ObjectInspector defaultOI = super.initialize(arguments);
+    checkArgCategory(arguments, ARRAY2_IDX, ObjectInspector.Category.LIST, 
FUNC_NAME,
+        org.apache.hadoop.hive.serde.serdeConstants.LIST_TYPE_NAME); //Array1 
is already getting validated in Parent class

Review Comment:
   Should we check if the type of elements of array1 is equal to array2? It 
might be OK if we allow type conversions here.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 848332)
    Time Spent: 20m  (was: 10m)

> implement array_except UDF in Hive
> ----------------------------------
>
>                 Key: HIVE-27112
>                 URL: https://issues.apache.org/jira/browse/HIVE-27112
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Taraka Rama Rao Lethavadla
>            Assignee: Taraka Rama Rao Lethavadla
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> *array_except(array1, array2)* 
> Returns an array of the elements in {{array1}} but not in {{array2, without 
> duplicates.}}
>  
> {noformat}
> > SELECT array_except(array(1, 2, 2, 3), array(1, 1, 3, 5));
> [2]
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to