924060929 commented on PR #64167: URL: https://github.com/apache/doris/pull/64167#issuecomment-4841898236
### Design-level feedback: generalize the existing colocate group instead of adding a parallel stack First — the two problems this PR targets are real and worth solving. Tenant-level colocation in particular is genuinely Doris-specific (it falls out of stacking colocation on the resource-tag multi-tenancy model, so there's no off-the-shelf design to copy). This comment is about the *implementation path*, not any single line; correctness issues are in a separate comment. **The existing `ColocateTableIndex` already has the tag dimension this feature needs:** - `ColocateGroupSchema` already carries `replicaAlloc` (effectively `Map<Tag, Short>`); - `group2BackendsPerBucketSeq` is already `Table<GroupId, Tag, List<List<Long>>>` — the backend sequence is *already* keyed by `(group, tag)`; - the balancer already runs per tag: `getBackendsPerBucketSeqByTag(groupId, tag)`, `relocateAndBalance(groupId, tag, …)`; - and the new index's own `groupName2Id` is `Table<String, Tag, Long>` — i.e. "the real group key is `(name, tag)`" is already the insight here; it was just built as a new class instead of lifting V1's `Map<String, GroupId>` / `table2Group` one dimension. **The PR bundles two orthogonal axes into one stack:** - **scope** — which nodes to colocate on: all-tags → per-tag; - **alignment** — how two tables' buckets map to the same slot: identity (equal bucketNum) → modulo (integer multiple, the slave case). These are independent. Bundling them is what forces the fork — kept orthogonal, *k* scopes × *m* alignments need `k + m` small strategies, not `k × m` stacks. **My main concern is that the isolation is only half-done.** The metadata layer is isolated (a second index), but the execution/scheduling layer is not: `DistributionSpecHash` now carries `colocateData` (note it isn't part of that class's `equals`/`hashCode`/`satisfy`, which is a latent memo hazard — two specs that differ only in colocate layout compare equal), and `Coordinator` / `ReportHandler` / `TabletScheduler` each grow an `isColocateV2` branch that doesn't fully re-establish the invariants V1 holds on those paths. So it lands in the worst middle ground: it pays the dual-stack maintenance cost, doesn't get clean isolation, and introduces invariant-drift bugs on shared paths (see the correctness comment for concrete cases). **Suggested direction:** lift the group key to `(name, tag)` — the global group becomes the default-tag special case — and model master/slave as a per-table *role* whose bucket→groupBucket mapping is a small strategy (identity / modulo). Then the balancer, health checker, and scan scheduling depend only on the abstract `groupBucketSeq` + alignment, never on "v1 vs v2". That also removes the need to thread `colocateData` through `DistributionSpecHash` at all — colocate detection can just query the index by `tableId + tag`, exactly as V1 does today. **Concretely, I'd suggest picking one end rather than the middle:** 1. **Fully isolate** — don't touch `DistributionSpecHash` or the shared scheduling classes; route the v2 decision through its own entry point. Lowest risk, smallest delta from here. 2. **Fully unify** — lift the dimension as above. Cleanest long-term: one index, one balancer, one set of invariants. To be fair, a parallel stack is a *defensible* choice if the goal is to shield the blast radius of a P0-prone core module and avoid V1 image/editlog migration — that's a real trade-off. If that's the driver, could we (a) make the isolation complete rather than partial, and (b) note the rationale in the PR description? The deferred cost is long-term dual-stack maintenance and two balancer/health paths that will drift. One caveat so this isn't oversold: unifying saves the *stack*, not the inherent complexity — the slave modulo mapping and the per-tag health split still have to be written either way; the abstraction just makes them live in one place under one set of invariants. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
