Quanlong Huang created IMPALA-13453:
---------------------------------------
Summary: REFRESH <table> PARTITION <partition> always update the
partition
Key: IMPALA-13453
URL: https://issues.apache.org/jira/browse/IMPALA-13453
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang
In table level REFRESH, we check whether the partition is actually changed and
skip updating unchanged partitions in catalog:
[https://github.com/apache/impala/blob/42fda24364786cc1a457890bd212bb3922479e95/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1098-L1101]
{code:java}
public void updatePartition(HdfsPartition.Builder partBuilder) throws
CatalogException {
HdfsPartition oldPartition = partBuilder.getOldInstance();
...
boolean partitionNotChanged = partBuilder.equalsToOriginal(oldPartition);
LOG.trace("Partition {} {}", oldPartition.getName(),
partitionNotChanged ? "changed" : "unchanged");
if (partitionNotChanged) return;
HdfsPartition newPartition = partBuilder.build();
// Partition is reloaded and hence cache directives are not dropped.
dropPartition(oldPartition, false);
addPartition(newPartition);
}{code}
However, in partition REFRESH, we always drop and add the partition:
[https://github.com/apache/impala/blob/42fda24364786cc1a457890bd212bb3922479e95/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L3093-L3096]
{code:java}
for (Map.Entry<HdfsPartition.Builder, HdfsPartition> entry :
partBuilderToPartitions.entrySet()) {
if (entry.getValue() != null) {
dropPartition(entry.getValue(), false);
}
addPartition(entry.getKey().build());
}{code}
We should add the same check to avoid updating unchanged partitions.
CC [~csringhofer], [~hemanth619]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)