[
https://issues.apache.org/jira/browse/IMPALA-13505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Fehr updated IMPALA-13505:
--------------------------------
Description:
Encountered a NullPointerException when running some TPC-DS queries (such as
q8) with the Calcite planner:
{code:java}
Stack Trace:java.lang.NullPointerException
at
org.apache.impala.analysis.Analyzer.lambda$resolveActualPath$18(Analyzer.java:4699)
at java.util.Collections$2.tryAdvance(Collections.java:4719)
at java.util.Collections$2.forEachRemaining(Collections.java:4727)
at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
at
org.apache.impala.analysis.Analyzer.resolveActualPath(Analyzer.java:4690)
at
org.apache.impala.analysis.Analyzer.lambda$addColumnsTo$17(Analyzer.java:4655)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.apache.impala.analysis.Analyzer.addColumnsTo(Analyzer.java:4655)
at
org.apache.impala.analysis.Analyzer.addJoinColumns(Analyzer.java:4732)
at org.apache.impala.planner.JoinNode.init(JoinNode.java:293)
at org.apache.impala.planner.HashJoinNode.init(HashJoinNode.java:82)
at
org.apache.impala.calcite.rel.phys.ImpalaHashJoinNode.<init>(ImpalaHashJoinNode.java:46)
...
{code}
{{SlotRef.getResolvedPath}} returns null at line 4699. Looking at the SlotRef,
I don't see any way to determine an origin, so this may be part of incomplete
implementation of the Calcite planner integration.
To reproduce
{code:java}
$ start-impala-cluster.py -s 1 --use_calcite_planner=true
$ impala-py.test
tests/query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8
{code}
*Analysis:*
The root issue with this particular query is that it contains a very lengthy
[list of zip
codes|https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test#L12-L62]
that are used in a where clause. The Calcite planner is producing this join
node for that where clause:
{noformat}
05:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: substring(tpcds.customer_address.ca_zip, 1, 5) = EXPR$0
| fk/pk conjuncts: assumed fk/pk
| runtime filters: RF002[bloom] <- EXPR$0
{noformat}
Since EXPR$0 is not a named column, it has a null resolved path and can be
skipped.
was:
Encountered a NullPointerException when running some TPC-DS queries (such as
q8) with the Calcite planner:
{code:java}
Stack Trace:java.lang.NullPointerException
at
org.apache.impala.analysis.Analyzer.lambda$resolveActualPath$18(Analyzer.java:4699)
at java.util.Collections$2.tryAdvance(Collections.java:4719)
at java.util.Collections$2.forEachRemaining(Collections.java:4727)
at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
at
org.apache.impala.analysis.Analyzer.resolveActualPath(Analyzer.java:4690)
at
org.apache.impala.analysis.Analyzer.lambda$addColumnsTo$17(Analyzer.java:4655)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.apache.impala.analysis.Analyzer.addColumnsTo(Analyzer.java:4655)
at
org.apache.impala.analysis.Analyzer.addJoinColumns(Analyzer.java:4732)
at org.apache.impala.planner.JoinNode.init(JoinNode.java:293)
at org.apache.impala.planner.HashJoinNode.init(HashJoinNode.java:82)
at
org.apache.impala.calcite.rel.phys.ImpalaHashJoinNode.<init>(ImpalaHashJoinNode.java:46)
...
{code}
{{SlotRef.getResolvedPath}} returns null at line 4699. Looking at the SlotRef,
I don't see any way to determine an origin, so this may be part of incomplete
implementation of the Calcite planner integration.
To reproduce
{code:java}
$ start-impala-cluster.py -s 1 --use_calcite_planner=true
$ impala-py.test
tests/query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8
{code}
*Analysis:*
The root issue with this particular query is that it contains a very lengthy
[list of zip
codes|https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test#L12-L62]
that are used in a where clause. The Calcite planner is producing this join
node for that where clause:
{noformat}
05:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: substring(tpcds.customer_address.ca_zip, 1, 5) = EXPR$0
| fk/pk conjuncts: assumed fk/pk
| runtime filters: RF002[bloom] <- EXPR$0
{noformat}
Since EXPR$0 is not a path to a column, it has a null resolved path and can be
skipped.
> NullPointerException in Analyzer.resolveActualPath with Calcite planner
> -----------------------------------------------------------------------
>
> Key: IMPALA-13505
> URL: https://issues.apache.org/jira/browse/IMPALA-13505
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Michael Smith
> Assignee: Jason Fehr
> Priority: Major
> Labels: calcite
>
> Encountered a NullPointerException when running some TPC-DS queries (such as
> q8) with the Calcite planner:
> {code:java}
> Stack Trace:java.lang.NullPointerException
> at
> org.apache.impala.analysis.Analyzer.lambda$resolveActualPath$18(Analyzer.java:4699)
> at java.util.Collections$2.tryAdvance(Collections.java:4719)
> at java.util.Collections$2.forEachRemaining(Collections.java:4727)
> at
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
> at
> org.apache.impala.analysis.Analyzer.resolveActualPath(Analyzer.java:4690)
> at
> org.apache.impala.analysis.Analyzer.lambda$addColumnsTo$17(Analyzer.java:4655)
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
> at
> org.apache.impala.analysis.Analyzer.addColumnsTo(Analyzer.java:4655)
> at
> org.apache.impala.analysis.Analyzer.addJoinColumns(Analyzer.java:4732)
> at org.apache.impala.planner.JoinNode.init(JoinNode.java:293)
> at org.apache.impala.planner.HashJoinNode.init(HashJoinNode.java:82)
> at
> org.apache.impala.calcite.rel.phys.ImpalaHashJoinNode.<init>(ImpalaHashJoinNode.java:46)
> ...
> {code}
> {{SlotRef.getResolvedPath}} returns null at line 4699. Looking at the
> SlotRef, I don't see any way to determine an origin, so this may be part of
> incomplete implementation of the Calcite planner integration.
> To reproduce
> {code:java}
> $ start-impala-cluster.py -s 1 --use_calcite_planner=true
> $ impala-py.test
> tests/query_test/test_tpcds_queries.py::TestTpcdsDecimalV2Query::test_tpcds_q8
> {code}
> *Analysis:*
> The root issue with this particular query is that it contains a very lengthy
> [list of zip
> codes|https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q8.test#L12-L62]
> that are used in a where clause. The Calcite planner is producing this join
> node for that where clause:
> {noformat}
> 05:HASH JOIN [INNER JOIN, BROADCAST]
> | hash predicates: substring(tpcds.customer_address.ca_zip, 1, 5) = EXPR$0
> | fk/pk conjuncts: assumed fk/pk
> | runtime filters: RF002[bloom] <- EXPR$0
> {noformat}
> Since EXPR$0 is not a named column, it has a null resolved path and can be
> skipped.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]