[
https://issues.apache.org/jira/browse/IMPALA-13842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17934664#comment-17934664
]
Jason Fehr commented on IMPALA-13842:
-------------------------------------
The issue that caused this particular failure is the
TestQueryLogTableBeeswax::test_redaction test in test_query_log.py. It does
not do a graceful shutdown, and the cluster was ungracefully brought down while
the query was being finalized. Specifically, the coordinator ended during the
FinalizeDml() call for the workload management insert dml. The last two lines
were:
{noformat}
I0307 02:23:41.181077 16135 coordinator.cc:919]
4e4d8c31c6fa4c77:dd57691f00000000] Finalizing query:
4e4d8c31c6fa4c77:dd57691f00000000
I0307 02:23:41.181205 16135 coordinator.cc:796]
4e4d8c31c6fa4c77:dd57691f00000000] ExecState: query
id=4e4d8c31c6fa4c77:dd57691f00000000 execution completed
I0307 02:23:41.181236 16135 coordinator.cc:1465]
4e4d8c31c6fa4c77:dd57691f00000000] Release admission control resources for
query_id=4e4d8c31c6fa4c77:dd57691f00000000
I0307 02:23:41.181377 16135 client-request-state.cc:1612]
4e4d8c31c6fa4c77:dd57691f00000000] Updating metastore with 1 altered partitions
(cluster_id=__HIVE_DEFAULT_PARTITION__/start_time_utc_hour=2025-03-07-10)
I0307 02:23:41.181393 16135 client-request-state.cc:1640]
4e4d8c31c6fa4c77:dd57691f00000000] Executing FinalizeDml() using CatalogService
Caught signal: SIGTERM. Daemon will exit.
{noformat}
Since the query was not allowed to gracefully finish, the HMS lock on the
sys.impala_query_log table was never released thus causing the following tests
to fail.
> Test against sys.impala_query_log should wait until coordinator quiesce.
> ------------------------------------------------------------------------
>
> Key: IMPALA-13842
> URL: https://issues.apache.org/jira/browse/IMPALA-13842
> Project: IMPALA
> Issue Type: Bug
> Components: Test
> Reporter: Riza Suminto
> Assignee: Jason Fehr
> Priority: Major
> Labels: broken-build
>
> We are seeing evidence of HMS deadlock for table sys.impala_query_log in
> downstream build.
> {noformat}
> I0307 02:24:06.992715 18405 SnapshotProducer.java:422]
> 5f4ecf63310cb7f5:e2db615800000000] Committed snapshot 4325155866837015464
> (MergeAppend)
> I0307 02:24:07.002012 18405 LoggingMetricsReporter.java:38]
> 5f4ecf63310cb7f5:e2db615800000000] Received metrics report:
> CommitReport{tableName=ImpalaHiveCatalog.sys.impala_query_log,
> snapshotId=4325155866837015464, sequenceNumber=9, operation=append,
> commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS,
> totalDuration=PT0.241321564S, count=1}, attempts=CounterResult{unit=COUNT,
> value=1}, addedDataFiles=CounterResult{unit=COUNT, value=1},
> removedDataFiles=null, totalDataFiles=CounterResult{unit=COUNT, value=9},
> addedDeleteFiles=null, addedEqualityDeleteFiles=null,
> addedPositionalDeleteFiles=null, removedDeleteFiles=null,
> removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null,
> totalDeleteFiles=CounterResult{unit=COUNT, value=0},
> addedRecords=CounterResult{unit=COUNT, value=1}, removedRecords=null,
> totalRecords=CounterResult{unit=COUNT, value=41},
> addedFilesSizeInBytes=CounterResult{unit=BYTES, value=26244},
> removedFilesSizeInBytes=null, totalFilesSizeInBytes=CounterResult{unit=BYTES,
> value=273450}, addedPositionalDeletes=null, removedPositionalDeletes=null,
> totalPositionalDeletes=CounterResult{unit=COUNT, value=0},
> addedEqualityDeletes=null, removedEqualityDeletes=null,
> totalEqualityDeletes=CounterResult{unit=COUNT, value=0}},
> metadata={iceberg-version=Apache Iceberg 1.5.2.2025.0.20.0-45 (commit
> f2db4a27a43aaec790af8b9e2ac51f4a50b26260)}}
> I0307 02:24:07.002182 18405 CatalogServiceCatalog.java:1327]
> 5f4ecf63310cb7f5:e2db615800000000] Added catalog version 2126 in table's
> sys.impala_query_log in-flight events
> W0307 02:24:07.053566 18405 Tasks.java:456]
> 5f4ecf63310cb7f5:e2db615800000000] Retrying task after failure: Waiting for
> lock on table sys.impala_query_log
> Java exception follows:
> org.apache.iceberg.hive.MetastoreLock$WaitingForLockException: Waiting for
> lock on table sys.impala_query_log
> at
> org.apache.iceberg.hive.MetastoreLock.lambda$acquireLock$1(MetastoreLock.java:217)
> at
> org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
> at
> org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
> at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
> at
> org.apache.iceberg.hive.MetastoreLock.acquireLock(MetastoreLock.java:209)
> at org.apache.iceberg.hive.MetastoreLock.lock(MetastoreLock.java:146)
> at
> org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:182)
> at
> org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135)
> at
> org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$3(BaseTransaction.java:427)
> at
> org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
> at
> org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
> at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
> at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
> at
> org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:423)
> at
> org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:318)
> at
> org.apache.impala.service.CatalogOpExecutor.insertIntoIcebergTable(CatalogOpExecutor.java:7593)
> at
> org.apache.impala.service.CatalogOpExecutor.updateCatalogImpl(CatalogOpExecutor.java:7386)
> at
> org.apache.impala.service.CatalogOpExecutor.updateCatalog(CatalogOpExecutor.java:7227)
> at
> org.apache.impala.service.JniCatalog.lambda$updateCatalog$15(JniCatalog.java:510)
> at
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
> at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
> at
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
> at
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
> at
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:251)
> at
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:265)
> at
> org.apache.impala.service.JniCatalog.updateCatalog(JniCatalog.java:509)
> ...
> W0307 02:25:49.672634 18405 Tasks.java:456]
> 5f4ecf63310cb7f5:e2db615800000000] Retrying task after failure: Waiting for
> lock on table sys.impala_query_log{noformat}
> This might be related to IMPALA-13772 that disable FETCH_ROWS_TIMEOUT_MS
> option.
> During custom cluster test run, impala minicluster is restart between tests.
> If test does not wait until coordinator quiesce (all pending INSERT INTO
> sys.impala_query_log is done), it might be the case that HMS lock is being
> hold when CatalogD terminates.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]