924060929 commented on PR #64167:
URL: https://github.com/apache/doris/pull/64167#issuecomment-4841898482
### Correctness issues found during review
Grouped by confidence. Locations are by class/method since line numbers will
shift; the key line is quoted for each.
**Confirmed**
1. **`PropertyAnalyzer.analyzeReplicaAllocation(Map, String)` silently drops
the forced-replication rewrite — regression, unrelated to this feature.**
The 2-arg overload now does `return analyzeReplicaAllocationImpl(properties,
prefix, true);`, while the `if
(!Config.force_olap_table_replication_allocation.isEmpty()) { properties =
forceRewriteReplicaAllocation(...); }` block was moved into the *new* 3-arg
overload only. Every existing 2-arg caller (CREATE TABLE, ADD PARTITION, ALTER
… SET replication, RESTORE) no longer honors
`force_olap_table_replication_allocation`.
*Fix:* have the 2-arg overload delegate to
`analyzeReplicaAllocation(properties, prefix, true)`.
2. **`TabletScheduler.chooseProperTag` — NPE on a null backend, affects all
tables.**
The added `if (tabletCtx.getExcludeTagSet().contains(be.getLocationTag()))
continue;` dereferences `be` before the existing short-circuit guard; `be =
infoService.getBackend(beId)` can be null (the `beId = -1` path, or a dropped
backend), and `excludeTagSet` defaults to `Collections.emptySet()` so the
argument is still evaluated for every table. Every sibling `delete*` method
added in this PR guards with `be != null`.
*Fix:* add the same `be != null` guard.
3. **`TenantLevelColocateTableCheckerAndBalancer.matchPartition` —
`checkState` halts repair for all tenant-colocate groups.**
`checkState(backendBucketsSeq.size() == index.getTablets().size())` assumes
each partition's bucket count equals the slave table's default, but `ADD
PARTITION … BUCKETS k` on a slave only requires k to be a multiple of the
master (per `checkSlaveDistribution`), so k can legally differ from the default
while `matchSlaveGroup` sizes the sequence from the default. `matchGroups` has
no per-group try/catch, so the IllegalStateException stops repair + balance for
*all* tenant-colocate groups.
*Fix:* size the expected sequence from the partition's bucket count, and/or
wrap per-group matching in try/catch as the relocate path does.
4. **`getFragmentBucketSeqLocationByTag` — `checkState(!result.isEmpty())`
fails the query (duplicated in `LoadBalanceScanWorkerSelector` and
`Coordinator`).**
After filtering a bucket's replica locations down to colocate-tag backends,
it throws if none remain. A tenant-colocate join where some bucket has no live
replica on a colocate-tag backend (replica loss / mid-clone / tag changed via
ALTER SYSTEM) aborts the whole query.
*Fix:* fall back to the unfiltered locations (or disable colocate for that
query) instead of asserting.
**Needs confirmation (mechanism is real; trigger depends on timing/state)**
5. **`JoinUtils.matchTenantLevelColocateGroup` —
`checkState(!bucketMap.isEmpty())` at plan time.** `getStableGroup` can return
a group whose backend sequence is transiently empty → planning throws instead
of degrading to a non-colocate plan.
6. **`ReportHandler.addReplica` tenant-colocate fall-through uses the full
`replicaAlloc`.** When the per-tag health resolves to a status outside the
keep-set, a still-needed non-colocate-tag replica may be deleted (data-loss
risk). Worth tracing the keep-set with a transient per-tag
over/under-replication case.
7. **Slave creation when the master group has no materialized backend
sequence.** `createTablets` →
`checkState(backendsPerBucketSeq.containsKey(tag))` throws an opaque
IllegalStateException (null case); and after edit-log replay the sequence can
be a non-null *empty* list, where `getSlaveBackendsPerBucketSeq` does `i %
backendsPerBucketSeq.size()` → ArithmeticException. The sibling
`getSlaveBackendsPerBucketSeqSet` already guards `isEmpty()`; this one doesn't.
*Fix:* add the `isEmpty()` guard, and surface a `DdlException` instead of a
bare `checkState`.
8. **`NereidsPlanner.getFragmentColocateDataWithSameSize` uses
`ThreadLocalRandom`** to pick which tag drives the fragment's bucket count →
the same query can plan to different bucket counts / parallelism run-to-run
(breaks plan reproducibility and query cache). More a quality issue than a
crash.
9. **`TenantLevelColocateTableIndex.removeMasterGroup`** does
`group2Schema.remove(groupId).getName()` with no null check, across two
separate lock acquisitions → NPE under concurrent removal of the last master
tables of the same group, leaving the index half-cleaned.
> Note: the `DistributionSpecHash.colocateData`-not-in-`equals/hashCode`
hazard is described in the design comment, since it's tied to the decision to
carry layout data in the spec.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]