Re: [PR] [feat](colocate) support tenant-level colocation [doris]

via GitHub Tue, 30 Jun 2026 02:32:51 -0700


924060929 commented on PR #64167:
URL: https://github.com/apache/doris/pull/64167#issuecomment-4841898236


   ### Design-level feedback: generalize the existing colocate group instead of 
adding a parallel stack
   
   First — the two problems this PR targets are real and worth solving. 
Tenant-level colocation in particular is genuinely Doris-specific (it falls out 
of stacking colocation on the resource-tag multi-tenancy model, so there's no 
off-the-shelf design to copy). This comment is about the *implementation path*, 
not any single line; correctness issues are in a separate comment.
   
   **The existing `ColocateTableIndex` already has the tag dimension this 
feature needs:**
   - `ColocateGroupSchema` already carries `replicaAlloc` (effectively 
`Map<Tag, Short>`);
   - `group2BackendsPerBucketSeq` is already `Table<GroupId, Tag, 
List<List<Long>>>` — the backend sequence is *already* keyed by `(group, tag)`;
   - the balancer already runs per tag: `getBackendsPerBucketSeqByTag(groupId, 
tag)`, `relocateAndBalance(groupId, tag, …)`;
   - and the new index's own `groupName2Id` is `Table<String, Tag, Long>` — 
i.e. "the real group key is `(name, tag)`" is already the insight here; it was 
just built as a new class instead of lifting V1's `Map<String, GroupId>` / 
`table2Group` one dimension.
   
   **The PR bundles two orthogonal axes into one stack:**
   - **scope** — which nodes to colocate on: all-tags → per-tag;
   - **alignment** — how two tables' buckets map to the same slot: identity 
(equal bucketNum) → modulo (integer multiple, the slave case).
   
   These are independent. Bundling them is what forces the fork — kept 
orthogonal, *k* scopes × *m* alignments need `k + m` small strategies, not `k × 
m` stacks.
   
   **My main concern is that the isolation is only half-done.** The metadata 
layer is isolated (a second index), but the execution/scheduling layer is not: 
`DistributionSpecHash` now carries `colocateData` (note it isn't part of that 
class's `equals`/`hashCode`/`satisfy`, which is a latent memo hazard — two 
specs that differ only in colocate layout compare equal), and `Coordinator` / 
`ReportHandler` / `TabletScheduler` each grow an `isColocateV2` branch that 
doesn't fully re-establish the invariants V1 holds on those paths. So it lands 
in the worst middle ground: it pays the dual-stack maintenance cost, doesn't 
get clean isolation, and introduces invariant-drift bugs on shared paths (see 
the correctness comment for concrete cases).
   
   **Suggested direction:** lift the group key to `(name, tag)` — the global 
group becomes the default-tag special case — and model master/slave as a 
per-table *role* whose bucket→groupBucket mapping is a small strategy (identity 
/ modulo). Then the balancer, health checker, and scan scheduling depend only 
on the abstract `groupBucketSeq` + alignment, never on "v1 vs v2". That also 
removes the need to thread `colocateData` through `DistributionSpecHash` at all 
— colocate detection can just query the index by `tableId + tag`, exactly as V1 
does today.
   
   **Concretely, I'd suggest picking one end rather than the middle:**
   1. **Fully isolate** — don't touch `DistributionSpecHash` or the shared 
scheduling classes; route the v2 decision through its own entry point. Lowest 
risk, smallest delta from here.
   2. **Fully unify** — lift the dimension as above. Cleanest long-term: one 
index, one balancer, one set of invariants.
   
   To be fair, a parallel stack is a *defensible* choice if the goal is to 
shield the blast radius of a P0-prone core module and avoid V1 image/editlog 
migration — that's a real trade-off. If that's the driver, could we (a) make 
the isolation complete rather than partial, and (b) note the rationale in the 
PR description? The deferred cost is long-term dual-stack maintenance and two 
balancer/health paths that will drift.
   
   One caveat so this isn't oversold: unifying saves the *stack*, not the 
inherent complexity — the slave modulo mapping and the per-tag health split 
still have to be written either way; the abstraction just makes them live in 
one place under one set of invariants.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [feat](colocate) support tenant-level colocation [doris]

Reply via email to