zabetak commented on code in PR #6402:
URL: https://github.com/apache/hive/pull/6402#discussion_r3021970427
##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator
COMPONENT_ACCESS
+
+set hive.cbo.enable=true;
+
+DROP TABLE IF EXISTS cbo_component_access_if_tbl;
+
+CREATE TABLE cbo_component_access_if_tbl (
+ `data` array<struct<jobs:array<struct<code:string,label:string>>>>
+) STORED AS ORC;
+
+INSERT INTO TABLE cbo_component_access_if_tbl
+SELECT array(
+ named_struct(
+ 'jobs', array(
+ named_struct('code', 'j1', 'label', 'l1'),
+ named_struct('code', 'j2', 'label', 'l2')
+ )
+ )
+);
+
+-- `data` is both the column name and the LATERAL VIEW table alias (matches
common real-world queries).
+-- Expression `data.dat.jobs.code` forces Calcite to introduce
COMPONENT_ACCESS for array-of-struct field access.
+-- `if(...)` is translated to CASE, and
HiveFunctionHelper.checkForStatefulFunctions walks the Rex tree.
+SELECT
+ if(concat_ws(',', data.dat.jobs.code) = '', null, concat_ws(',',
data.dat.jobs.code)) AS jobs_codes
+FROM cbo_component_access_if_tbl t
+LATERAL VIEW explode(t.`data`) `data` AS `dat`;
Review Comment:
To verify the fix EXPLAIN CBO should suffice since we don't care about the
actual execution of the query. In addition some elements seem redundant like
`concat_ws`, `explode`, and `LATERAL VIEW` so please simplify the case as much
as possible.
##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator
COMPONENT_ACCESS
+
+set hive.cbo.enable=true;
+
+DROP TABLE IF EXISTS cbo_component_access_if_tbl;
Review Comment:
Cleanup is taken care of by the test framework so no need to include this.
##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator
COMPONENT_ACCESS
Review Comment:
NVL appears nowhere inside this file so please use a more descriptive file
name.
##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator
COMPONENT_ACCESS
+
+set hive.cbo.enable=true;
+
+DROP TABLE IF EXISTS cbo_component_access_if_tbl;
+
+CREATE TABLE cbo_component_access_if_tbl (
+ `data` array<struct<jobs:array<struct<code:string,label:string>>>>
+) STORED AS ORC;
+
+INSERT INTO TABLE cbo_component_access_if_tbl
+SELECT array(
+ named_struct(
+ 'jobs', array(
+ named_struct('code', 'j1', 'label', 'l1'),
+ named_struct('code', 'j2', 'label', 'l2')
+ )
+ )
+);
Review Comment:
The problem is a compilation error so we don't care about the data thus
INSERT can be removed. This makes the test faster and more minimal avoiding
potential unrelated failures in the future.
##########
ql/src/test/org/apache/hadoop/hive/ql/parse/type/TestHiveFunctionHelper.java:
##########
@@ -62,4 +64,24 @@ public void testGetUDTFFunctionThrowingException() throws
SemanticException {
// 'upper' is not a udtf so should throw exception
functionHelper.getUDTFFunction("upper", operands);
}
+
+ @Test
+ public void testCoalesceWithComponentAccessDoesNotAssert() throws
SemanticException {
Review Comment:
Since we use NVL rename to `testNVL...`
##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator
COMPONENT_ACCESS
Review Comment:
Comment is not strictly necessary but if we want put something let's simply
put the Jira ID plus summary and nothing more.
##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator
COMPONENT_ACCESS
+
+set hive.cbo.enable=true;
Review Comment:
This is the default value so no need to set it explicitly.
##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator
COMPONENT_ACCESS
+
+set hive.cbo.enable=true;
+
+DROP TABLE IF EXISTS cbo_component_access_if_tbl;
+
+CREATE TABLE cbo_component_access_if_tbl (
+ `data` array<struct<jobs:array<struct<code:string,label:string>>>>
Review Comment:
Probably one level of nesting should suffice to repro the problem don't need
to make the DDL overly complex.
##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java:
##########
@@ -346,8 +347,14 @@ public Void visitCall(final RexCall call) {
GenericUDF nodeUDF = SqlFunctionConverter.getHiveUDF(
call.getOperator(), call.getType(), call.getOperands().size());
if (nodeUDF == null) {
+ // Some internal Calcite operators (e.g. COMPONENT_ACCESS used for
nested field access over
+ // collections) are not backed by a Hive GenericUDF. They are not
stateful and should not
+ // fail query compilation; still recurse into operands to find any
stateful expressions.
+ if (call.getOperator() == HiveComponentAccess.COMPONENT_ACCESS) {
+ return super.visitCall(call);
+ }
throw new AssertionError("Cannot retrieve function " +
call.getOperator().getName()
- + " within StatefulFunctionsChecker");
+ + " (kind=" + call.getOperator().getKind() + ") within
StatefulFunctionsChecker");
}
Review Comment:
The proposed fix addresses the problem for `COMPONENT_ACCESS` operator but
we may hit it again for some other internal operator that does not have a Hive
mapping. If that happens we will have to update the code here and add more
if/else statements which makes the code a bit brittle.
Personally, I would get rid of the `AssertionError` and I would opt for a
more general change that does not depend on the specific Calcite operator.
```java
if (nodeUDF != null && FunctionRegistry.isStateful(nodeUDF)) {
throw new Util.FoundOne(call);
}
```
In other words when nodeUDF is null (i.e., Hive mapping is missing) then we
assume that the operator is not stateful similar to what happens in
`HiveFunctionHelper.isStateful(FunctionInfo)`. This should be fine since
`UDFType#stateful` is false by default so in general when appropriate info is
missing we assume that that functions/operators are not statefull.
Moreover, there are very few (only 2 in the entire Hive codebase) actual
usages of `@UDFType(stateful = true)` so the specific annotations does not seem
widely adopted/used.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]