Re: [PR] [feat](colocate) support tenant-level colocation [doris]

via GitHub Tue, 30 Jun 2026 02:32:48 -0700


924060929 commented on PR #64167:
URL: https://github.com/apache/doris/pull/64167#issuecomment-4841898482


   ### Correctness issues found during review
   
   Grouped by confidence. Locations are by class/method since line numbers will 
shift; the key line is quoted for each.
   
   **Confirmed**
   
   1. **`PropertyAnalyzer.analyzeReplicaAllocation(Map, String)` silently drops 
the forced-replication rewrite — regression, unrelated to this feature.**
   The 2-arg overload now does `return analyzeReplicaAllocationImpl(properties, 
prefix, true);`, while the `if 
(!Config.force_olap_table_replication_allocation.isEmpty()) { properties = 
forceRewriteReplicaAllocation(...); }` block was moved into the *new* 3-arg 
overload only. Every existing 2-arg caller (CREATE TABLE, ADD PARTITION, ALTER 
… SET replication, RESTORE) no longer honors 
`force_olap_table_replication_allocation`.
   *Fix:* have the 2-arg overload delegate to 
`analyzeReplicaAllocation(properties, prefix, true)`.
   
   2. **`TabletScheduler.chooseProperTag` — NPE on a null backend, affects all 
tables.**
   The added `if (tabletCtx.getExcludeTagSet().contains(be.getLocationTag())) 
continue;` dereferences `be` before the existing short-circuit guard; `be = 
infoService.getBackend(beId)` can be null (the `beId = -1` path, or a dropped 
backend), and `excludeTagSet` defaults to `Collections.emptySet()` so the 
argument is still evaluated for every table. Every sibling `delete*` method 
added in this PR guards with `be != null`.
   *Fix:* add the same `be != null` guard.
   
   3. **`TenantLevelColocateTableCheckerAndBalancer.matchPartition` — 
`checkState` halts repair for all tenant-colocate groups.**
   `checkState(backendBucketsSeq.size() == index.getTablets().size())` assumes 
each partition's bucket count equals the slave table's default, but `ADD 
PARTITION … BUCKETS k` on a slave only requires k to be a multiple of the 
master (per `checkSlaveDistribution`), so k can legally differ from the default 
while `matchSlaveGroup` sizes the sequence from the default. `matchGroups` has 
no per-group try/catch, so the IllegalStateException stops repair + balance for 
*all* tenant-colocate groups.
   *Fix:* size the expected sequence from the partition's bucket count, and/or 
wrap per-group matching in try/catch as the relocate path does.
   
   4. **`getFragmentBucketSeqLocationByTag` — `checkState(!result.isEmpty())` 
fails the query (duplicated in `LoadBalanceScanWorkerSelector` and 
`Coordinator`).**
   After filtering a bucket's replica locations down to colocate-tag backends, 
it throws if none remain. A tenant-colocate join where some bucket has no live 
replica on a colocate-tag backend (replica loss / mid-clone / tag changed via 
ALTER SYSTEM) aborts the whole query.
   *Fix:* fall back to the unfiltered locations (or disable colocate for that 
query) instead of asserting.
   
   **Needs confirmation (mechanism is real; trigger depends on timing/state)**
   
   5. **`JoinUtils.matchTenantLevelColocateGroup` — 
`checkState(!bucketMap.isEmpty())` at plan time.** `getStableGroup` can return 
a group whose backend sequence is transiently empty → planning throws instead 
of degrading to a non-colocate plan.
   
   6. **`ReportHandler.addReplica` tenant-colocate fall-through uses the full 
`replicaAlloc`.** When the per-tag health resolves to a status outside the 
keep-set, a still-needed non-colocate-tag replica may be deleted (data-loss 
risk). Worth tracing the keep-set with a transient per-tag 
over/under-replication case.
   
   7. **Slave creation when the master group has no materialized backend 
sequence.** `createTablets` → 
`checkState(backendsPerBucketSeq.containsKey(tag))` throws an opaque 
IllegalStateException (null case); and after edit-log replay the sequence can 
be a non-null *empty* list, where `getSlaveBackendsPerBucketSeq` does `i % 
backendsPerBucketSeq.size()` → ArithmeticException. The sibling 
`getSlaveBackendsPerBucketSeqSet` already guards `isEmpty()`; this one doesn't.
   *Fix:* add the `isEmpty()` guard, and surface a `DdlException` instead of a 
bare `checkState`.
   
   8. **`NereidsPlanner.getFragmentColocateDataWithSameSize` uses 
`ThreadLocalRandom`** to pick which tag drives the fragment's bucket count → 
the same query can plan to different bucket counts / parallelism run-to-run 
(breaks plan reproducibility and query cache). More a quality issue than a 
crash.
   
   9. **`TenantLevelColocateTableIndex.removeMasterGroup`** does 
`group2Schema.remove(groupId).getName()` with no null check, across two 
separate lock acquisitions → NPE under concurrent removal of the last master 
tables of the same group, leaving the index half-cleaned.
   
   > Note: the `DistributionSpecHash.colocateData`-not-in-`equals/hashCode` 
hazard is described in the design comment, since it's tied to the decision to 
carry layout data in the spec.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [feat](colocate) support tenant-level colocation [doris]

Reply via email to