[
https://issues.apache.org/jira/browse/IMPALA-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894016#comment-17894016
]
Fang-Yu Rao commented on IMPALA-13467:
--------------------------------------
[~prozsa] and I had a discussion regarding this JIRA. We feel that it would be
good if we leave some notes here so that it would be easier for people without
any background knowledge of Iceberg in Impala to understand this fix.
- *Why did we hit this issue after IMPALA-12861?*
Before IMPALA-12861, {{partition.getFileFormat()}} evaluated to {{PARQUET}} in
the following in {{HdfsScanNode.java}}
([https://gerrit.cloudera.org/c/21871/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#b525]),
and thus the size of '{{{}fileFormats{}}}' would be non-zero even for an
instance of {{{}IcebergScanNode{}}}.
{code:java}
// Populate fileFormats_.
for (FeFsPartition partition : getSampledOrRawPartitions()) {
if (partition.getFileFormat() != HdfsFileFormat.ICEBERG) {
fileFormats_.add(partition.getFileFormat());
}
}
{code}
After IMPALA-12861, the for-loop above won't be executed. Instead the following
of {{IcebergScanNode.java}} will be called. For an instance of
{{{}IcebergScanNode{}}}, '{{{}fileFormats{}}}' will be 0 if there is no file
descriptor.
{code:java}
@Override
protected void populateFileFormats() throws ImpalaRuntimeException {
//TODO IMPALA-11577: optimize file format counting
boolean hasParquet = false;
boolean hasOrc = false;
boolean hasAvro = false;
for (FileDescriptor fileDesc : fileDescs_) {
byte fileFormat =
fileDesc.getFbFileMetadata().icebergMetadata().fileFormat();
if (fileFormat == FbIcebergDataFileFormat.PARQUET) {
hasParquet = true;
} else if (fileFormat == FbIcebergDataFileFormat.ORC) {
hasOrc = true;
} else if (fileFormat == FbIcebergDataFileFormat.AVRO) {
hasAvro = true;
} else {
throw new ImpalaRuntimeException(String.format(
"Invalid Iceberg file format of file: %s",
fileDesc.getAbsolutePath()));
}
}
if (hasParquet) fileFormats_.add(HdfsFileFormat.PARQUET);
if (hasOrc) fileFormats_.add(HdfsFileFormat.ORC);
if (hasAvro) fileFormats_.add(HdfsFileFormat.AVRO);
}
{code}
Recall that '{{{}statsTuple_{}}}' will be written only if
{{hasParquet(fileFormats_)}} or {{hasOrc(fileFormats_)}} evaluate to true.
Hence, before IMPALA-12861, whether or not there is any file descriptor
associated with the {{{}IcebergScanNode{}}}, '{{{}fileFormats_{}}}' is always
non-zero, which resulted in a non-null '{{{}statsTuple_{}}}'. That
{{Preconditions.checkNotNull()}} would not be hit as a result.
But after IMPALA-12861, '{{{}fileFormats{}}}' will be 0 if there is no file
descriptor associated with the {{{}IcebergScanNode{}}}, resulting in a null
'{{{}statsTuple_{}}}'. Hence we hit that {{Preconditions.checkNotNull()}} for
'{{{}statsTuple_{}}}' if we do not skip the check.
- *Why didn't we catch this issue in Apache Impala's pre-commit tests?*
We suspect this is likely due to the timezone used when the query was executed.
For instance, on my machine, without setting any timezone explicitly (or
setting it to ""), without the fix, we could hit the {{NullPointerException}}
after IMPALA-12861.
{code:java}
[localhost:21050] functional_parquet> select * from
functional_parquet.iceberg_partitioned i1,
>
functional_parquet.iceberg_partitioned i2
> where i1.action = i2.action and
> i1.id = i2.id and
> i2.event_time = '2020-01-01
10:00:00';
Query: select * from functional_parquet.iceberg_partitioned i1,
functional_parquet.iceberg_partitioned i2
where i1.action = i2.action and
i1.id = i2.id and
i2.event_time = '2020-01-01 10:00:00'
Query submitted at: 2024-10-29 11:57:20 (Coordinator: http://fangyu:25000)
ERROR: Query 2341ee29ae178280:062f18c400000000 failed:
NullPointerException: null
{code}
But when we set the timezone to "{{{}Europe/Budapest{}}}", the issue of
{{NullPointerException}} would be shadowed. We did not hit the issue because
the number of associated file descriptors in this case won't be 0, so
'{{{}fileFormats_{}}}' won't be empty.
{code:java}
[localhost:21050] functional_parquet> set timezone="Europe/Budapest";
TIMEZONE set to "Europe/Budapest"
[localhost:21050] functional_parquet> select * from
functional_parquet.iceberg_partitioned i1,
>
functional_parquet.iceberg_partitioned i2
> where i1.action = i2.action and
> i1.id = i2.id and
> i2.event_time = '2020-01-01
10:00:00';
Query: select * from functional_parquet.iceberg_partitioned i1,
functional_parquet.iceberg_partitioned i2
where i1.action = i2.action and
i1.id = i2.id and
i2.event_time = '2020-01-01 10:00:00'
Query submitted at: 2024-10-29 12:04:04 (Coordinator: http://fangyu:25000)
Query state can be monitored at:
http://fangyu:25000/query_plan?query_id=4c49ec602a34f253:a1c3112200000000
+----+------+--------+---------------------+----+------+--------+---------------------+
| id | user | action | event_time | id | user | action | event_time
|
+----+------+--------+---------------------+----+------+--------+---------------------+
| 13 | Alan | click | 2020-01-01 10:00:00 | 13 | Alan | click | 2020-01-01
10:00:00 |
| 9 | Alan | click | 2020-01-01 10:00:00 | 9 | Alan | click | 2020-01-01
10:00:00 |
| 3 | Alan | click | 2020-01-01 10:00:00 | 3 | Alan | click | 2020-01-01
10:00:00 |
| 12 | Alan | click | 2020-01-01 10:00:00 | 12 | Alan | click | 2020-01-01
10:00:00 |
| 18 | Alan | click | 2020-01-01 10:00:00 | 18 | Alan | click | 2020-01-01
10:00:00 |
| 10 | Alan | click | 2020-01-01 10:00:00 | 10 | Alan | click | 2020-01-01
10:00:00 |
+----+------+--------+---------------------+----+------+--------+---------------------+
Fetched 6 row(s) in 0.88s
{code}
On the other hand or more importantly, the takeaway from this incident
according to [~prozsa] is that we should refactor the partition handling of
Iceberg-related code in the near future if possible.
> test_min_max_filters() failed due to NullPointerException
> ---------------------------------------------------------
>
> Key: IMPALA-13467
> URL: https://issues.apache.org/jira/browse/IMPALA-13467
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 4.5.0
> Reporter: Fang-Yu Rao
> Assignee: Peter Rozsa
> Priority: Critical
> Labels: broken-build
>
> We found that the following query in
> [min_max_filters.test|https://github.com/apache/impala/blame/master/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test]
> could fail due to NullPointerException.
> {code:java}
> ---- QUERY
> SET RUNTIME_FILTER_WAIT_TIME_MS=$RUNTIME_FILTER_WAIT_TIME_MS;
> select * from functional_parquet.iceberg_partitioned i1,
> functional_parquet.iceberg_partitioned i2
> where i1.action = i2.action and
> i1.id = i2.id and
> i2.event_time = '2020-01-01 10:00:00';
> ---- RUNTIME_PROFILE
> row_regex:.* RF00.\[min_max\] -> i1\.action.*
> {code}
> The stack trace was below.
> {code:java}
> I1018 18:26:21.967474 15092 Frontend.java:2190]
> 2449ca58b6c7b2c3:20e13eca00000000] Analyzing query: select * from
> functional_parquet.iceberg_partitioned i1,
> functional_parquet.iceberg_partitioned i2
> where i1.action = i2.action and
> i1.id = i2.id and
> i2.event_time = '2020-01-01 10:00:00' db: functional_kudu
> I1018 18:26:21.967491 15092 Frontend.java:2202]
> 2449ca58b6c7b2c3:20e13eca00000000] The original executor group sets from
> executor membership snapshot: [TExecutorGroupSet(curr_num_ex
> I1018 18:26:21.967509 15092 RequestPoolService.java:200]
> 2449ca58b6c7b2c3:20e13eca00000000] Default pool only, scheduler allocation is
> not specified.
> I1018 18:26:21.967532 15092 Frontend.java:2222]
> 2449ca58b6c7b2c3:20e13eca00000000] A total of 2 executor group sets to be
> considered for auto-scaling: [TExecutorGroupSet(curr_num_ex
> I1018 18:26:21.967546 15092 Frontend.java:2263]
> 2449ca58b6c7b2c3:20e13eca00000000] Consider executor group set:
> TExecutorGroupSet(curr_num_executors:3, expected_num_executors:20, ex
> I1018 18:26:21.968324 15092 AnalysisContext.java:521]
> 2449ca58b6c7b2c3:20e13eca00000000] Analysis took 0 ms
> I1018 18:26:21.968353 15092 BaseAuthorizationChecker.java:114]
> 2449ca58b6c7b2c3:20e13eca00000000] Authorization check took 0 ms
> I1018 18:26:21.968367 15092 Frontend.java:2599]
> 2449ca58b6c7b2c3:20e13eca00000000] Analysis and authorization finished.
> I1018 18:26:21.968899 15092 IcebergScanPlanner.java:846]
> 2449ca58b6c7b2c3:20e13eca00000000] Push down the predicate:
> ref(name="event_time") == 1577901600000000 to iceberg
> I1018 18:26:21.969009 15092 SnapshotScan.java:124]
> 2449ca58b6c7b2c3:20e13eca00000000] Scanning table
> hdfs://localhost:20500/test-warehouse/iceberg_test/iceberg_partitioned
> snapshot
> I1018 18:26:21.969400 15092 LoggingMetricsReporter.java:38]
> 2449ca58b6c7b2c3:20e13eca00000000] Received metrics report:
> ScanReport{tableName=hdfs://localhost:20500/test-warehouse/ic
> I1018 18:26:21.969846 15092 jni-util.cc:321]
> 2449ca58b6c7b2c3:20e13eca00000000] java.lang.NullPointerException
> at
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:903)
> at
> org.apache.impala.planner.HdfsScanNode.initOverlapPredicate(HdfsScanNode.java:845)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.assignRuntimeFilters(RuntimeFilterGenerator.java:1257)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateFiltersRecursive(RuntimeFilterGenerator.java:1159)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateFiltersRecursive(RuntimeFilterGenerator.java:1162)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateFiltersRecursive(RuntimeFilterGenerator.java:1157)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateFiltersRecursive(RuntimeFilterGenerator.java:1162)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateFilters(RuntimeFilterGenerator.java:1091)
> at
> org.apache.impala.planner.RuntimeFilterGenerator.generateRuntimeFilters(RuntimeFilterGenerator.java:918)
> at
> org.apache.impala.planner.Planner.createPlanFragments(Planner.java:160)
> at org.apache.impala.planner.Planner.createPlans(Planner.java:310)
> at
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1969)
> at
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2968)
> at
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2730)
> at
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2269)
> at
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:2030)
> at
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> {code}
>
> We recently had changes in
> [HdfsScanNode.java|https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java]
> in IMPALA-12861 so maybe it's related.
>
> The NullPointerException was thrown in initOverlapPredicate() of
> [HdfsScanNode.java|https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java]
> due to 'statsTuple_' being null.
> {code:java}
> public void initOverlapPredicate(Analyzer analyzer) {
> if (!allParquet_) return;
> Preconditions.checkNotNull(statsTuple_);
> ..
> }
> {code}
> 'stats' is written in computeStatsTupleAndConjuncts() of
> [HdfsScanNode.java|https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java],
> which in turn is called in init() of
> [HdfsScanNode.java|https://github.com/apache/impala/blame/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java]
> and will be called only if hasParquet(fileFormats_) or hasOrc(fileFormats_)
> evaluate to true. I am wondering if it's possible that for some reason
> 'stats' is not populated.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]