morrySnow commented on PR #61146:
URL: https://github.com/apache/doris/pull/61146#issuecomment-4132316484

   ## Deep Dive: Remaining 7 Non-GraphSimplifier Issues
   
   ### Issue #4: `getLiteralAlias()` destructive removal breaks retry (HIGH — 
CONFIRMED)
   
   **Evidence chain:**
   1. `JoinOrderJob.tryEnumerateJoin()` (line 94-101): first calls 
`subgraphEnumerator.enumerate()`, and if it fails, calls 
`graphSimplifier.simplifyGraph(limit) && subgraphEnumerator.enumerate()` 
(retry).
   2. During the first `enumerate()`, `PlanReceiver.addGroup()` calls 
`proposeProject()` which calls `hyperGraph.getLiteralAlias(bitmap, bitmap)`.
   3. `HyperGraph.getLiteralAlias()` does `nodeToLiteralAlias.remove(left)` on 
leaf groups (when `left == right`).
   4. `PlanReceiver.reset()` clears `planTable` and `usdEdges` but **does NOT 
restore** `nodeToLiteralAlias`.
   5. On the retry `enumerate()` call, `getLiteralAlias()` finds empty map → 
literal alias projections are lost.
   
   **Fix:** Either make `getLiteralAlias()` non-destructive (don't `remove()`, 
just `get()`), or save/restore `nodeToLiteralAlias` in `PlanReceiver.reset()`. 
Since `nodeToLiteralAlias` lives in `HyperGraph` (not `PlanReceiver`), the 
cleanest fix is to not mutate it:
   
   ```java
   // Before:
   nodeToLiteralAlias.remove(left);
   // After:
   // Just read without removing — the map is consulted per-join anyway
   ```
   
   ---
   
   ### Issue #5: `foundEdgesContain` corrupts cached `containSimpleEdges` 
BitSet (HIGH — CONFIRMED)
   
   **Evidence:** `SubgraphEnumerator.EdgeCalculator`:
   ```java
   public List<Edge> foundEdgesContain(long subgraph) {
       BitSet edgeMap = containSimpleEdges.get(subgraph);  // reference, not 
copy
       Preconditions.checkState(edgeMap != null);
       edgeMap.or(containComplexEdges.get(subgraph));       // mutates cached 
entry!
       return 
edgeMap.stream().mapToObj(edges::get).collect(Collectors.toList());
   }
   ```
   
   Note: `foundEdgesContain()` is currently **not called** in active code paths 
(only `foundSimpleEdgesContain` and `foundComplexEdgesContain` are used 
separately). So this is a latent bug / landmine. Still should be fixed as it's 
`public`.
   
   **Fix:** `BitSet edgeMap = (BitSet) 
containSimpleEdges.get(subgraph).clone();`
   
   ---
   
   ### Issue #6: `dpHypOptimize()` restores stale `rewritePlan` (HIGH — 
CONFIRMED)
   
   **Evidence flow:**
   ```java
   Plan oldRewrittenPlan = cascadesContext.getRewritePlan();  // save old
   // ... build rewritten plan from DPhyper output ...
   cascadesContext.setRewritePlan(rewritten);  // set new
   cascadesContext.toMemo();                    // memo built from 'rewritten'
   cascadesContext.setRewritePlan(oldRewrittenPlan);  // RESTORE OLD — mismatch!
   ```
   
   After this, `cascadesContext.getRewritePlan()` returns the pre-DPhyper plan 
while memo contains the post-DPhyper structure. Any code that reads 
`getRewritePlan()` later (e.g., for error reporting, plan display, or fallback) 
sees stale data.
   
   **Verdict:** This looks intentional to preserve the "original rewrite plan" 
for potential later use, but it creates an invariant violation. If 
`getRewritePlan()` is only used before optimization, this is harmless. But it's 
fragile and undocumented.
   
   **Fix:** Either don't restore (keep aligned), or add a comment explaining 
the intentional divergence.
   
   ---
   
   ### Issue #7: `getBestLogicalPlan` silent fallback (MEDIUM — NUANCED)
   
   **Logic:**
   ```java
   public Plan getBestLogicalPlan(GroupExpression groupExpression) {
       // Try to match physical plan's children to a logical expression
       for (GroupExpression logicalExpression : logicalExpressions) {
           if (childrenGroups.equals(logicalExpression.children())) {
               return logicalExpression.getPlan();  // exact match
           }
       }
       // Fallback: skip enforcers (Distribute/Sort), otherwise return FIRST 
logical expr
       if (groupExpression.getPlan() instanceof PhysicalDistribute
               || groupExpression.getPlan() instanceof PhysicalQuickSort || 
logicalExpressions.isEmpty()) {
           return null;  // enforcers have no logical equivalent
       } else {
           return getLogicalExpression().getPlan();  // returns FIRST = 
possibly original
       }
   }
   ```
   
   The fallback `getLogicalExpression().getPlan()` returns the first logical 
expression, which may be the **original unoptimized** one. However, in 
`Memo.copyOutBestLogicalPlan()`, when this returns a non-null plan, it's used 
as the output. If DPhyper found a better join order but the physical plan's 
children don't match any logical expression's children (due to group merging), 
the fallback silently returns the old plan.
   
   **Verdict:** In practice, DPhyper creates both logical and physical 
expressions in the same groups, so the children match should usually succeed. 
The risk is in edge cases with group merging. This is a correctness concern but 
likely rare.
   
   ---
   
   ### Issue #10: `nonInnerJoinedNodes` overwrite (MEDIUM — CONFIRMED, LOW 
PRACTICAL IMPACT)
   
   **Code:**
   ```java
   } else {
       nonInnerJoinedNodes = LongBitmap.or(leftSubtreeNodes, 
rightSubtreeNodes);  // overwrites!
   ```
   
   Should be: `nonInnerJoinedNodes |= LongBitmap.or(leftSubtreeNodes, 
rightSubtreeNodes);`
   
   **Practical impact:** Due to the tree traversal order in `addJoin()`, a 
non-inner join at a higher level in the tree will always have its subtree nodes 
as a superset of any lower non-inner joins. But for sibling non-inner joins 
under an inner join, the first sibling's tracking is lost. E.g., 
`INNER(LEFT(T1,T2), LEFT(T3,T4))` — after processing LEFT(T3,T4), 
`nonInnerJoinedNodes = {T3,T4}`, losing `{T1,T2}`. This could lead to 
incorrectly splitting inner join conjuncts that reference T1/T2.
   
   ---
   
   ### Issue #11: Dead null-check in `DPhyperNode` (LOW — CONFIRMED)
   
   ```java
   public DPhyperNode(int index, Group group) {
       super(group.getLogicalExpression().getPlan(), index);  // NPE if null
       Preconditions.checkArgument(group != null, "...");     // dead code
   ```
   
   Per Doris coding standards, assertions are preferred over defensive checks. 
The `super()` call already fails fast on null. Either use 
`Objects.requireNonNull(group)` in the super call or remove the dead check.
   
   ---
   
   ### Issue #12: Tests swallow exceptions without failing (MEDIUM — CONFIRMED)
   
   Found in **two** test files, not just one:
   
   1. **`hypergraph/OtherJoinTest.java`** (line ~109):
   ```java
   } catch (Exception ex) {
       ex.printStackTrace(System.out);
       // NO Assertions.fail() or rethrow!
   }
   ```
   
   2. **`hypergraphv2/OtherJoinTest.java`** (line ~79): Same pattern.
   
   Both tests catch exceptions during optimization/evaluation and print them, 
but let the test pass. This masks real failures.
   
   **Fix:** Add `Assertions.fail("Unexpected exception: " + ex.getMessage(), 
ex);` in each catch block.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to