[
https://issues.apache.org/jira/browse/IMPALA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893759#comment-17893759
]
ASF subversion and git services commented on IMPALA-13467:
----------------------------------------------------------
Commit ff1c1cc99d4bc0633c9aa0c28edd5601c3186b8a in impala's branch
refs/heads/master from Peter Rozsa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ff1c1cc99 ]
IMPALA-13467: Fix partition list size calculation for empty Iceberg scan
nodes
This patch adds a condition that checks whether the IcebergScanNode
contains any files before using the size of the partition list. The
partition list size of Iceberg tables is always one regardless of the
scanned files. This behavior can cause NPE in runtime filter generation.
By setting the calculated partition size to 0, the runtime filter
generation does not occur.
Change-Id: I5a0595831f3bd87074144ab7d5da27508e73ef33
Reviewed-on: http://gerrit.cloudera.org:8080/21964
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> test_min_max_filters() failed due to NullPointerException
> ---------------------------------------------------------
>
> Key: IMPALA-13467
> URL: https://issues.apache.org/jira/browse/IMPALA-13467
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 4.5.0
> Reporter: Fang-Yu Rao
> Assignee: Peter Rozsa
> Priority: Critical
> Labels: broken-build
>
> We found that the following query in
> [min_max_filters.test|https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test]
> could fail due to NullPointerException.
> {code:java}
> ---- QUERY
> SET RUNTIME_FILTER_WAIT_TIME_MS=$RUNTIME_FILTER_WAIT_TIME_MS;
> select * from functional_parquet.iceberg_partitioned i1,
> functional_parquet.iceberg_partitioned i2
> where i1.action = i2.action and
> i1.id = i2.id and
> i2.event_time = '2020-01-01 10:00:00';
> ---- RUNTIME_PROFILE
> row_regex:.* RF00.\[min_max\] -> i1\.action.*
> {code}
> The stack trace was below.
> {code:java}
> I1018 18:26:21.967474 15092 Frontend.java:2190]
> 2449ca58b6c7b2c3:20e13eca00000000] Analyzing query: select * from
> functional_parquet.iceberg_partitioned i1,
> functional_parquet.iceberg_partitioned i2
> where i1.action = i2.action and
> i1.id = i2.id and
> i2.event_time = '2020-01-01 10:00:00' db: functional_kudu
> I1018 18:26:21.967491 15092 Frontend.java:2202]
> 2449ca58b6c7b2c3:20e13eca00000000] The original executor group sets from
> executor membership snapshot: [TExecutorGroupSet(curr_num_ex
> I1018 18:26:21.967509 15092 RequestPoolService.java:200]
> 2449ca58b6c7b2c3:20e13eca00000000] Default pool only, scheduler allocation is
> not specified.
> I1018 18:26:21.967532 15092 Frontend.java:2222]
> 2449ca58b6c7b2c3:20e13eca00000000] A total of 2 executor group sets to be
> considered for auto-scaling: [TExecutorGroupSet(curr_num_ex
> I1018 18:26:21.967546 15092 Frontend.java:2263]
> 2449ca58b6c7b2c3:20e13eca00000000] Consider executor group set:
> TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, ex
> I1018 18:26:21.968324 15092 AnalysisContext.java:521]
> 2449ca58b6c7b2c3:20e13eca00000000] Analysis took 0 ms
> I1018 18:26:21.968353 15092 BaseAuthorizationChecker.java:114]
> 2449ca58b6c7b2c3:20e13eca00000000] Authorization check took 0 ms
> I1018 18:26:21.968367 15092 Frontend.java:2599]
> 2449ca58b6c7b2c3:20e13eca00000000] Analysis and authorization finished.
> I1018 18:26:21.968899 15092 IcebergScanPlanner.java:846]
> 2449ca58b6c7b2c3:20e13eca00000000] Push down the predicate:
> ref(name="event_time") == 1577901600000000 to iceberg
> I1018 18:26:21.969009 15092 SnapshotScan.java:124]
> 2449ca58b6c7b2c3:20e13eca00000000] Scanning table
> hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned
> snapshot
> I1018 18:26:21.969400 15092 LoggingMetricsReporter.java:38]
> 2449ca58b6c7b2c3:20e13eca00000000] Received metrics report:
> ScanReport{tableName=hdfs://localhost:20500/test-warehouse/ic
> I1018 18:26:21.969846 15092 jni-util.cc:321]
> 2449ca58b6c7b2c3:20e13eca00000000] java.lang.NullPointerException
> at
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:903)
> at
> org.apache.impala.planner.HdfsScanNode.initOverlapPredicate(HdfsScanNode.java:845)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.assignRuntimeFilters(RuntimeFilterGenerator.java:1257)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateFiltersRecursive(RuntimeFilterGenerator.java:1159)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateFiltersRecursive(RuntimeFilterGenerator.java:1162)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateFiltersRecursive(RuntimeFilterGenerator.java:1157)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateFiltersRecursive(RuntimeFilterGenerator.java:1162)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateFilters(RuntimeFilterGenerator.java:1091)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateRuntimeFilters(RuntimeFilterGenerator.java:918)
> at
> org.apache.impala.planner.Planner.createPlanFragments(Planner.java:160)
> at org.apache.impala.planner.Planner.createPlans(Planner.java:310)
> at
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1969)
> at
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2968)
> at
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2730)
> at
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2269)
> at
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:2030)
> at
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> {code}
>
> We recently had changes in
> [HdfsScanNode.java|https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java]
> in IMPALA-12861 so maybe it's related.
>
> The NullPointerException was thrown in initOverlapPredicate() of
> [HdfsScanNode.java|https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java]
> due to 'statsTuple_' being null.
> {code:java}
> public void initOverlapPredicate(Analyzer analyzer) {
> if (!allParquet_) return;
> Preconditions.checkNotNull(statsTuple_);
> ..
> }
> {code}
> 'stats' is written in computeStatsTupleAndConjuncts() of
> [HdfsScanNode.java|https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java],
> which in turn is called in init() of
> [HdfsScanNode.java|https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java]
> and will be called only if hasParquet(fileFormats_) or hasOrc(fileFormats_)
> evaluate to true. I am wondering if it's possible that for some reason
> 'stats' is not populated.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]