github-actions[bot] commented on code in PR #60944:
URL: https://github.com/apache/doris/pull/60944#discussion_r3322775105
##########
fe/fe-core/src/main/java/org/apache/doris/common/proc/ColocationGroupProcDir.java:
##########
@@ -74,4 +93,73 @@ public ProcResult fetchResult() throws AnalysisException {
result.setRows(infos);
return result;
}
+
+ private Map<Tag, List<List<Long>>> getCloudBackendSeqsFromTablets(GroupId
groupId, ColocateTableIndex index) {
+ Map<Tag, List<List<Long>>> backendsSeq = Maps.newHashMap();
+ List<Long> tableIds = index.getAllTableIds(groupId);
+ for (Long tableId : tableIds) {
+ long dbId = groupId.dbId;
+ if (dbId == 0) {
+ Long tableDbId = index.getDbIdByTblIdNullable(groupId,
tableId);
+ if (tableDbId == null) {
+ continue;
+ }
+ dbId = tableDbId;
+ }
+ Database db = Env.getCurrentInternalCatalog().getDbNullable(dbId);
+ if (db == null) {
+ continue;
+ }
+ Table table = db.getTableNullable(tableId);
+ if (!(table instanceof OlapTable)) {
+ continue;
+ }
+ backendsSeq = getCloudBackendSeqsFromTable((OlapTable) table);
+ if (!backendsSeq.isEmpty()) {
+ return backendsSeq;
+ }
+ }
+ return backendsSeq;
+ }
+
+ private Map<Tag, List<List<Long>>> getCloudBackendSeqsFromTable(OlapTable
olapTable) {
+ Map<Tag, List<List<Long>>> backendsSeq = Maps.newHashMap();
+ olapTable.readLock();
+ try {
+ Partition firstPartition = null;
+ for (Partition partition : olapTable.getAllPartitions()) {
+ firstPartition = partition;
+ break;
+ }
+ if (firstPartition == null) {
+ return backendsSeq;
+ }
+ MaterializedIndex baseIndex = firstPartition.getBaseIndex();
+ List<Tablet> tablets = baseIndex.getTablets();
+ List<List<Long>> bucketSeq =
Lists.newArrayListWithCapacity(tablets.size());
+ boolean hasBackend = false;
+ for (int i = 0; i < tablets.size(); i++) {
+ List<Long> bucketBackends = new ArrayList<>();
+ for (Replica replica : tablets.get(i).getReplicas()) {
+ long backendId = replica.getBackendIdWithoutException();
+ if (backendId < 0 && replica instanceof CloudReplica) {
Review Comment:
This resolves `CloudReplica` backend ids while `olapTable.readLock()` is
still held. In cloud mode `getBackendIdWithoutException()` dispatches to
`CloudReplica.getBackendId()`, which calls
`getCurrentClusterId()`/`waitForAutoStart()` and may notify meta-service and
wait up to the auto-start timeout when the current compute group is suspended;
the fallback then calls `getPrimaryBackendId()` with the same path. A `SHOW
PROC /colocation_group/...` request can therefore hold the table metadata read
lock during external RPC/sleep, blocking DDL or other writers on this table.
Please snapshot the first partition's tablets/replicas under the table lock and
do the cloud backend-id resolution after releasing it, or use a non-blocking
metadata-only backend id source for this display path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]