[ 
https://issues.apache.org/jira/browse/IMPALA-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896400#comment-17896400
 ] 

ASF subversion and git services commented on IMPALA-13505:
----------------------------------------------------------

Commit 9e05ffcaaf9ed67dd3310af674d107a484aef7fa in impala's branch 
refs/heads/master from Jason Fehr
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9e05ffcaa ]

IMPALA-13505: Fix NPE in Calcite Planner

Fixes the NullPointerException occurring when using the Calcite
planner with
test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8.
The NPE was thrown from the Planner where it generates the list of
columns in the query for use in the profile and workload management.

Testing was accomplished by manually running the impacted the test
and with a new custom cluster test that replicates the failing test.

Change-Id: I4d282120e596fd39a569d1ce9b25024f4f174dd0
Reviewed-on: http://gerrit.cloudera.org:8080/22033
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> NullPointerException in Analyzer.resolveActualPath with Calcite planner
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-13505
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13505
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.5.0
>            Reporter: Michael Smith
>            Assignee: Jason Fehr
>            Priority: Major
>              Labels: calcite
>             Fix For: Impala 4.5.0
>
>
> Encountered a NullPointerException when running some TPC-DS queries (such as 
> q8) with the Calcite planner:
> {code:java}
> Stack Trace:java.lang.NullPointerException
>         at 
> org.apache.impala.analysis.Analyzer.lambda$resolveActualPath$18(Analyzer.java:4699)
>         at java.util.Collections$2.tryAdvance(Collections.java:4719)
>         at java.util.Collections$2.forEachRemaining(Collections.java:4727)
>         at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
>         at 
> org.apache.impala.analysis.Analyzer.resolveActualPath(Analyzer.java:4690)
>         at 
> org.apache.impala.analysis.Analyzer.lambda$addColumnsTo$17(Analyzer.java:4655)
>         at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
>         at 
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
>         at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
>         at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
>         at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
>         at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
>         at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
>         at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>         at 
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
>         at 
> org.apache.impala.analysis.Analyzer.addColumnsTo(Analyzer.java:4655)
>         at 
> org.apache.impala.analysis.Analyzer.addJoinColumns(Analyzer.java:4732)
>         at org.apache.impala.planner.JoinNode.init(JoinNode.java:293)
>         at org.apache.impala.planner.HashJoinNode.init(HashJoinNode.java:82)
>         at 
> org.apache.impala.calcite.rel.phys.ImpalaHashJoinNode.<init>(ImpalaHashJoinNode.java:46)
> ...
> {code}
> {{SlotRef.getResolvedPath}} returns null at line 4699. Looking at the 
> SlotRef, I don't see any way to determine an origin, so this may be part of 
> incomplete implementation of the Calcite planner integration.
> To reproduce
> {code:java}
> $ start-impala-cluster.py -s 1 --use_calcite_planner=true
> $ impala-py.test 
> tests/query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8
> {code}
> *Analysis:*
> The root issue with this particular query is that it contains a very lengthy 
> [list of zip 
> codes|https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test#L12-L62]
>  that are used in a where clause. The Calcite planner is producing this join 
> node for that where clause:
> {noformat}
> 05:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: substring(tpcds.customer_address.ca_zip, 1, 5) = EXPR$0
> |  fk/pk conjuncts: assumed fk/pk
> |  runtime filters: RF002[bloom] <- EXPR$0
> {noformat}
> Since EXPR$0 is not a named column, it has a null resolved path and can be 
> skipped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to