[
https://issues.apache.org/jira/browse/IMPALA-13842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17934040#comment-17934040
]
Riza Suminto commented on IMPALA-13842:
---------------------------------------
One of these metrics should be used to detect whether test can wrap up or wait
a bit more.
{code:java}
const char* ImpaladMetricKeys::COMPLETED_QUERIES_QUEUED =
"impala-server.completed-queries.queued";
const char* ImpaladMetricKeys::COMPLETED_QUERIES_WRITTEN =
"impala-server.completed-queries.written";
const char* ImpaladMetricKeys::COMPLETED_QUERIES_FAIL =
"impala-server.completed-queries.failure";
const char* ImpaladMetricKeys::COMPLETED_QUERIES_SCHEDULED_WRITES =
"impala-server.completed-queries.scheduled-writes";
const char* ImpaladMetricKeys::COMPLETED_QUERIES_MAX_RECORDS_WRITES =
"impala-server.completed-queries.max-records-writes";
const char* ImpaladMetricKeys::COMPLETED_QUERIES_WRITE_DURATIONS =
"impala-server.completed-queries.write-durations"; {code}
If they don't get to expected value, then test should mention it in error
message or somewhere in stacktrace.
> Test against sys.impala_query_log should wait until coordinator quiesce.
> ------------------------------------------------------------------------
>
> Key: IMPALA-13842
> URL: https://issues.apache.org/jira/browse/IMPALA-13842
> Project: IMPALA
> Issue Type: Bug
> Components: Test
> Reporter: Riza Suminto
> Assignee: Jason Fehr
> Priority: Major
> Labels: broken-build
>
> We are seeing evidence of HMS deadlock for table sys.impala_query_log in
> downstream build.
> {noformat}
> I0307 02:24:06.992715 18405 SnapshotProducer.java:422]
> 5f4ecf63310cb7f5:e2db615800000000] Committed snapshot 4325155866837015464
> (MergeAppend)
> I0307 02:24:07.002012 18405 LoggingMetricsReporter.java:38]
> 5f4ecf63310cb7f5:e2db615800000000] Received metrics report:
> CommitReport{tableName=ImpalaHiveCatalog.sys.impala_query_log,
> snapshotId=4325155866837015464, sequenceNumber=9, operation=append,
> commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS,
> totalDuration=PT0.241321564S, count=1}, attempts=CounterResult{unit=COUNT,
> value=1}, addedDataFiles=CounterResult{unit=COUNT, value=1},
> removedDataFiles=null, totalDataFiles=CounterResult{unit=COUNT, value=9},
> addedDeleteFiles=null, addedEqualityDeleteFiles=null,
> addedPositionalDeleteFiles=null, removedDeleteFiles=null,
> removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null,
> totalDeleteFiles=CounterResult{unit=COUNT, value=0},
> addedRecords=CounterResult{unit=COUNT, value=1}, removedRecords=null,
> totalRecords=CounterResult{unit=COUNT, value=41},
> addedFilesSizeInBytes=CounterResult{unit=BYTES, value=26244},
> removedFilesSizeInBytes=null, totalFilesSizeInBytes=CounterResult{unit=BYTES,
> value=273450}, addedPositionalDeletes=null, removedPositionalDeletes=null,
> totalPositionalDeletes=CounterResult{unit=COUNT, value=0},
> addedEqualityDeletes=null, removedEqualityDeletes=null,
> totalEqualityDeletes=CounterResult{unit=COUNT, value=0}},
> metadata={iceberg-version=Apache Iceberg 1.5.2.2025.0.20.0-45 (commit
> f2db4a27a43aaec790af8b9e2ac51f4a50b26260)}}
> I0307 02:24:07.002182 18405 CatalogServiceCatalog.java:1327]
> 5f4ecf63310cb7f5:e2db615800000000] Added catalog version 2126 in table's
> sys.impala_query_log in-flight events
> W0307 02:24:07.053566 18405 Tasks.java:456]
> 5f4ecf63310cb7f5:e2db615800000000] Retrying task after failure: Waiting for
> lock on table sys.impala_query_log
> Java exception follows:
> org.apache.iceberg.hive.MetastoreLock$WaitingForLockException: Waiting for
> lock on table sys.impala_query_log
> at
> org.apache.iceberg.hive.MetastoreLock.lambda$acquireLock$1(MetastoreLock.java:217)
> at
> org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
> at
> org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
> at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
> at
> org.apache.iceberg.hive.MetastoreLock.acquireLock(MetastoreLock.java:209)
> at org.apache.iceberg.hive.MetastoreLock.lock(MetastoreLock.java:146)
> at
> org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:182)
> at
> org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135)
> at
> org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$3(BaseTransaction.java:427)
> at
> org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
> at
> org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
> at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
> at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
> at
> org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:423)
> at
> org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:318)
> at
> org.apache.impala.service.CatalogOpExecutor.insertIntoIcebergTable(CatalogOpExecutor.java:7593)
> at
> org.apache.impala.service.CatalogOpExecutor.updateCatalogImpl(CatalogOpExecutor.java:7386)
> at
> org.apache.impala.service.CatalogOpExecutor.updateCatalog(CatalogOpExecutor.java:7227)
> at
> org.apache.impala.service.JniCatalog.lambda$updateCatalog$15(JniCatalog.java:510)
> at
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
> at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
> at
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
> at
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:100)
> at
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:251)
> at
> org.apache.impala.service.JniCatalog.execAndSerialize(JniCatalog.java:265)
> at
> org.apache.impala.service.JniCatalog.updateCatalog(JniCatalog.java:509)
> ...
> W0307 02:25:49.672634 18405 Tasks.java:456]
> 5f4ecf63310cb7f5:e2db615800000000] Retrying task after failure: Waiting for
> lock on table sys.impala_query_log{noformat}
> This might be related to IMPALA-13772 that disable FETCH_ROWS_TIMEOUT_MS
> option.
> During custom cluster test run, impala minicluster is restart between tests.
> If test does not wait until coordinator quiesce (all pending INSERT INTO
> sys.impala_query_log is done), it might be the case that HMS lock is being
> hold when CatalogD terminates.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]