[
https://issues.apache.org/jira/browse/IMPALA-14107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17957020#comment-17957020
]
Quanlong Huang commented on IMPALA-14107:
-----------------------------------------
By adding some debug logs ([^IMPALA-14107-debug.patch]), I can see what's wrong
here. It's a bug that we acquire the table lock and update a partition but
doesn't assign a new catalog version to the table in
CatalogOpExecutor#fireReloadEventAndUpdateRefreshEventId():
{code:java}
if (partition != null) {
HdfsPartition.Builder partBuilder = new
HdfsPartition.Builder(partition);
partBuilder.setLastRefreshEventId(eventIds.get(0));
hdfsTbl.updatePartition(partBuilder); {code}
[https://github.com/apache/impala/blob/d630d6f8af8cd86a845fc0415c99b8aa6608c28f/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L7307]
HdfsTable#updatePartition() replaces the partition instance with a new one
(with a new partition id) so the catalog version of the table should be
increased to reflect the change.
Note that this is only required for partition-level REFRESH. For table-level
REFRESH, we update the table level lastRefreshEventId which is not sent to
coordinators. Nothing changes in coordinator's view. For partition-level
REFRESH, the partition id changes so coordinator needs a new partition list. If
the catalog version of the table remains unchanged, coordinator will keep using
the stale partition list which has the stale partition id. Thus keeps failing
in PARTITION_NOT_FOUND error.
Uploaded the logs for the debug: [^impalad-IMPALA-14107-debug.INFO]
[^catalogd-IMPALA-14107-debug.INFO]
I was testing on commit d630d6f8a.
> test_reload_events_with_transient_partitions stuck in local catalog mode
> ------------------------------------------------------------------------
>
> Key: IMPALA-14107
> URL: https://issues.apache.org/jira/browse/IMPALA-14107
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Riza Suminto
> Assignee: Riza Suminto
> Priority: Major
> Attachments: IMPALA-14107-debug.patch,
> catalogd-IMPALA-14107-debug.INFO, catalogd.INFO,
> impalad-IMPALA-14107-debug.INFO, impalad.INFO
>
>
> test_reload_events_with_transient_partitions can get stuck when running in
> local catalog mode. From the catalod.INFO, it looks like catalog keep looking
> for non-existent partition id.
> {noformat}
> W20250527 16:04:11.311885 1240999 HdfsTable.java:2302]
> 5b44f2b8f845ceb7:2febe44200000000] Missing partition ID: 59, Table:
> test_reload_events_with_transient_partitions_local_catalog_77dd977e.tbl
> W20250527 16:04:12.118695 1240999 HdfsTable.java:2302]
> 5b44f2b8f845ceb7:2febe44200000000] Missing partition ID: 59, Table:
> test_reload_events_with_transient_partitions_local_catalog_77dd977e.tbl
> W20250527 16:04:13.121974 1240999 HdfsTable.java:2302]
> 5b44f2b8f845ceb7:2febe44200000000] Missing partition ID: 59, Table:
> test_reload_events_with_transient_partitions_local_catalog_77dd977e.tbl
> W20250527 16:04:14.325909 1240999 HdfsTable.java:2302]
> 5b44f2b8f845ceb7:2febe44200000000] Missing partition ID: 59, Table:
> test_reload_events_with_transient_partitions_local_catalog_77dd977e.tbl
> ...
> {noformat}
> Attached are both impalad and catalogd log when the issue happen.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]