foxtail463 opened a new pull request, #64036:
URL: https://github.com/apache/doris/pull/64036
### What problem does this PR solve?
Problem Summary:
Nested MV rewrite needs to distinguish two different identities during fuzzy
StructInfo collection:
```sql
-- Query side: base table + view.
SELECT ...
FROM fact_src t
LEFT JOIN dim_full d0
ON ...
LEFT JOIN v_dim_full_non_double d1
ON ...;
-- v_dim_full_non_double is a view over dim_full.
CREATE VIEW v_dim_full_non_double AS
SELECT ...
FROM dim_full
WHERE double_flag = '0';
-- Child MVs.
CREATE MATERIALIZED VIEW mv_fact AS
SELECT ...
FROM fact_src;
CREATE MATERIALIZED VIEW mv_dim_full AS
SELECT ...
FROM dim_full;
CREATE MATERIALIZED VIEW mv_dim_full_view_non_double AS
SELECT ...
FROM v_dim_full_non_double;
-- Target MV side: child MVs.
CREATE MATERIALIZED VIEW mv_target AS
SELECT ...
FROM mv_fact t
LEFT JOIN mv_dim_full d0
ON ...
LEFT JOIN mv_dim_full_view_non_double d1
ON ...;
```
In this shape, child rewrite can first introduce MV scan relations into memo.
Then the parent group should be able to build a candidate plan from those MV
scan relations and match mv_target.
The old StructInfo candidate path used the table/common-table-id based cache
key
in StructInfoMap's candidate map to organize memo candidates. That key only
describes the table family covered by one MV definition; it is a search-space
key, not the identity of a concrete candidate. The exact candidate identity
is
relationIdSet, which describes the relations contained by one memo candidate
plan
tree.
In the example above, the rewritten scan candidate for mv_dim_full and the
rewritten scan candidate for mv_dim_full_view_non_double can fall into the
same
table/common-table-id cache key while representing different relationIdSet
values. If one candidate overwrites or is reused as the other, the parent
mv_target candidate is assembled with the wrong child relation, so the final
target MV rewrite becomes path-sensitive and may fail.
This refactor makes the identity boundary explicit:
- use table ids only to expand the relation search space for an MV
- use exact relationIdSet as the StructInfo candidate identity
- cache candidates by target relation search space, with exact relationIdSet
as
the inner key
- register tableId -> relationId when catalog relations enter memo, including
nested MV scan relations
- clear StructInfoMap candidate caches when relation identity changes
- keep candidate plan materialization lazy until StructInfo is actually
needed
This keeps base-table, view-derived, and rewritten MV-scan candidates
coexisting
under the same coarse table family without overwriting each other.
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [x] Regression test
- [x] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [x] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]