morrySnow commented on PR #61146:
URL: https://github.com/apache/doris/pull/61146#issuecomment-4132316484
## Deep Dive: Remaining 7 Non-GraphSimplifier Issues
### Issue #4: `getLiteralAlias()` destructive removal breaks retry (HIGH —
CONFIRMED)
**Evidence chain:**
1. `JoinOrderJob.tryEnumerateJoin()` (line 94-101): first calls
`subgraphEnumerator.enumerate()`, and if it fails, calls
`graphSimplifier.simplifyGraph(limit) && subgraphEnumerator.enumerate()`
(retry).
2. During the first `enumerate()`, `PlanReceiver.addGroup()` calls
`proposeProject()` which calls `hyperGraph.getLiteralAlias(bitmap, bitmap)`.
3. `HyperGraph.getLiteralAlias()` does `nodeToLiteralAlias.remove(left)` on
leaf groups (when `left == right`).
4. `PlanReceiver.reset()` clears `planTable` and `usdEdges` but **does NOT
restore** `nodeToLiteralAlias`.
5. On the retry `enumerate()` call, `getLiteralAlias()` finds empty map →
literal alias projections are lost.
**Fix:** Either make `getLiteralAlias()` non-destructive (don't `remove()`,
just `get()`), or save/restore `nodeToLiteralAlias` in `PlanReceiver.reset()`.
Since `nodeToLiteralAlias` lives in `HyperGraph` (not `PlanReceiver`), the
cleanest fix is to not mutate it:
```java
// Before:
nodeToLiteralAlias.remove(left);
// After:
// Just read without removing — the map is consulted per-join anyway
```
---
### Issue #5: `foundEdgesContain` corrupts cached `containSimpleEdges`
BitSet (HIGH — CONFIRMED)
**Evidence:** `SubgraphEnumerator.EdgeCalculator`:
```java
public List<Edge> foundEdgesContain(long subgraph) {
BitSet edgeMap = containSimpleEdges.get(subgraph); // reference, not
copy
Preconditions.checkState(edgeMap != null);
edgeMap.or(containComplexEdges.get(subgraph)); // mutates cached
entry!
return
edgeMap.stream().mapToObj(edges::get).collect(Collectors.toList());
}
```
Note: `foundEdgesContain()` is currently **not called** in active code paths
(only `foundSimpleEdgesContain` and `foundComplexEdgesContain` are used
separately). So this is a latent bug / landmine. Still should be fixed as it's
`public`.
**Fix:** `BitSet edgeMap = (BitSet)
containSimpleEdges.get(subgraph).clone();`
---
### Issue #6: `dpHypOptimize()` restores stale `rewritePlan` (HIGH —
CONFIRMED)
**Evidence flow:**
```java
Plan oldRewrittenPlan = cascadesContext.getRewritePlan(); // save old
// ... build rewritten plan from DPhyper output ...
cascadesContext.setRewritePlan(rewritten); // set new
cascadesContext.toMemo(); // memo built from 'rewritten'
cascadesContext.setRewritePlan(oldRewrittenPlan); // RESTORE OLD — mismatch!
```
After this, `cascadesContext.getRewritePlan()` returns the pre-DPhyper plan
while memo contains the post-DPhyper structure. Any code that reads
`getRewritePlan()` later (e.g., for error reporting, plan display, or fallback)
sees stale data.
**Verdict:** This looks intentional to preserve the "original rewrite plan"
for potential later use, but it creates an invariant violation. If
`getRewritePlan()` is only used before optimization, this is harmless. But it's
fragile and undocumented.
**Fix:** Either don't restore (keep aligned), or add a comment explaining
the intentional divergence.
---
### Issue #7: `getBestLogicalPlan` silent fallback (MEDIUM — NUANCED)
**Logic:**
```java
public Plan getBestLogicalPlan(GroupExpression groupExpression) {
// Try to match physical plan's children to a logical expression
for (GroupExpression logicalExpression : logicalExpressions) {
if (childrenGroups.equals(logicalExpression.children())) {
return logicalExpression.getPlan(); // exact match
}
}
// Fallback: skip enforcers (Distribute/Sort), otherwise return FIRST
logical expr
if (groupExpression.getPlan() instanceof PhysicalDistribute
|| groupExpression.getPlan() instanceof PhysicalQuickSort ||
logicalExpressions.isEmpty()) {
return null; // enforcers have no logical equivalent
} else {
return getLogicalExpression().getPlan(); // returns FIRST =
possibly original
}
}
```
The fallback `getLogicalExpression().getPlan()` returns the first logical
expression, which may be the **original unoptimized** one. However, in
`Memo.copyOutBestLogicalPlan()`, when this returns a non-null plan, it's used
as the output. If DPhyper found a better join order but the physical plan's
children don't match any logical expression's children (due to group merging),
the fallback silently returns the old plan.
**Verdict:** In practice, DPhyper creates both logical and physical
expressions in the same groups, so the children match should usually succeed.
The risk is in edge cases with group merging. This is a correctness concern but
likely rare.
---
### Issue #10: `nonInnerJoinedNodes` overwrite (MEDIUM — CONFIRMED, LOW
PRACTICAL IMPACT)
**Code:**
```java
} else {
nonInnerJoinedNodes = LongBitmap.or(leftSubtreeNodes,
rightSubtreeNodes); // overwrites!
```
Should be: `nonInnerJoinedNodes |= LongBitmap.or(leftSubtreeNodes,
rightSubtreeNodes);`
**Practical impact:** Due to the tree traversal order in `addJoin()`, a
non-inner join at a higher level in the tree will always have its subtree nodes
as a superset of any lower non-inner joins. But for sibling non-inner joins
under an inner join, the first sibling's tracking is lost. E.g.,
`INNER(LEFT(T1,T2), LEFT(T3,T4))` — after processing LEFT(T3,T4),
`nonInnerJoinedNodes = {T3,T4}`, losing `{T1,T2}`. This could lead to
incorrectly splitting inner join conjuncts that reference T1/T2.
---
### Issue #11: Dead null-check in `DPhyperNode` (LOW — CONFIRMED)
```java
public DPhyperNode(int index, Group group) {
super(group.getLogicalExpression().getPlan(), index); // NPE if null
Preconditions.checkArgument(group != null, "..."); // dead code
```
Per Doris coding standards, assertions are preferred over defensive checks.
The `super()` call already fails fast on null. Either use
`Objects.requireNonNull(group)` in the super call or remove the dead check.
---
### Issue #12: Tests swallow exceptions without failing (MEDIUM — CONFIRMED)
Found in **two** test files, not just one:
1. **`hypergraph/OtherJoinTest.java`** (line ~109):
```java
} catch (Exception ex) {
ex.printStackTrace(System.out);
// NO Assertions.fail() or rethrow!
}
```
2. **`hypergraphv2/OtherJoinTest.java`** (line ~79): Same pattern.
Both tests catch exceptions during optimization/evaluation and print them,
but let the test pass. This masks real failures.
**Fix:** Add `Assertions.fail("Unexpected exception: " + ex.getMessage(),
ex);` in each catch block.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]