nooneuse opened a new pull request, #64891:
URL: https://github.com/apache/doris/pull/64891

   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Related PR: #xxx
   
   Problem Summary:
   - This PR introduces **Multi-Stage Predicate Lazy Materialization 
(multi-stage predicate LM)** in the BE storage scan path (`SegmentIterator`) to 
reduce unnecessary predicate-column reads and predicate evaluation cost.
   - In many OLAP queries with multiple conjuncts, evaluating all predicates 
requires reading multiple predicate columns up-front. When some predicate 
columns are expensive to read/decode, this can cause avoidable I/O and CPU 
overhead even if many rows could be filtered out early.
   - Multi-stage predicate LM splits predicate evaluation into two phases:
     - **Stage1** evaluates a selected subset of predicate columns first for 
coarse filtering.
     - **Stage2** evaluates the remaining predicate columns only on the 
surviving rows, using either:
       - **by-rowids** (selective Stage1 → read late predicate columns by 
rowids to reduce rows read), or
       - **by-all-rows** (high survival ratio → scan all rows for late 
predicate columns to avoid excessive random reads).
   - The feature is **manually controlled via session variables / per-statement 
overrides** (hint-like behavior), enabling users to experiment and validate 
benefits with profile metrics before any future cost-based automation.
   
   ### Release note
   
   - Added **multi-stage predicate lazy materialization** in BE 
`SegmentIterator`, splitting predicate evaluation into Stage1/Stage2 and 
exposing profile counters (e.g. `PredicateLMStage1InputRows`, 
`PredicateLMStage2ByRowIdsBatches`, `PredicateLMStage2ByAllRowsBatches`).
   - Added/exposed session variables:
     - `enable_multi_stage_predicate_lm` (bool): enable/disable multi-stage 
predicate LM.
     - `predicate_lm_stage1_cols` (string): optionally specify Stage1 predicate 
columns.
     - `predicate_lm_stage1_survival_ratio_threshold` (double): threshold to 
choose Stage2 strategy (by-rowids vs by-all-rows).
   - Enhanced `predicate_lm_stage1_cols` parsing and scoping:
     - Tolerates whitespace/backticks/duplicates.
     - Supports scoped identifiers `col` / `table.col` / `db.table.col` to 
target a specific scan in multi-table queries.
     - Uses scan context (db/table name; rollup suffix handling) for accurate 
scoped matching.
   - Improved robustness and safer defaults:
     - Unknown columns or non-matching scoped tokens in 
`predicate_lm_stage1_cols` are **ignored** instead of failing the query, 
avoiding multi-table query failures caused by schema differences.
     - When there is **no runtime filter** and the effective Stage1 column list 
is empty, the scan **falls back to single-stage predicate evaluation** 
(equivalent to multi-stage disabled for that scan) to avoid unpredictable “pick 
an arbitrary predicate column as Stage1” behavior and potential regressions.
   - Extended scan-node thrift payload to carry `db_name` for accurate 
`db.table.col` matching without changing `table_name` formatting; FE populates 
`db_name` and BE consumes it.
   - Updated BE unit tests, regression tests, and documentation accordingly.
   
   ### Detailed Description
   
   # Multi-Stage Predicate Lazy Materialization
   
   ## 1. Overview (Summary)
   
   Multi-Stage Predicate Lazy Materialization (multi-stage predicate LM) is a 
storage-layer scan optimization.
   It splits “predicate column reading + predicate evaluation” into **two 
stages (Stage1/Stage2)**. By “reading only a small subset of predicate columns 
first for coarse filtering, then evaluating the remaining predicates on the 
surviving rows”, it reduces unnecessary column reads and computation overhead.
   
   At the current stage, this feature is **manually configured**. It is 
intended to behave like a hint to influence execution behavior. In the future 
PR, the FE will leverage statistics to automatically choose suitable columns.
   
   Key goals:
   - Reduce total predicate-column I/O (especially when some columns are 
expensive to read/deserialize, or there are many predicate columns)
   - Reduce predicate evaluation cost (vectorized evaluation / short-circuit 
evaluation)
   - Reduce I/O in cases where predicates are selective
   
   ### Behavior at a Glance
   - **Stage1**: Read the configured stage1 predicate columns (plus any 
required delete-condition columns, runtime filters, etc.) and perform the first 
round of filtering.
   - **Stage2**: Perform the second round of filtering using the “remaining 
predicate columns” that are not included in Stage1 (late predicate columns).
     Stage2 has two strategies:
     - **by-rowids**: If the survival ratio after Stage1 is low, read Stage2 
predicate columns only for the rowids produced by Stage1 (random/rowid-based 
reads) to minimize the number of rows read.
     - **by-all-rows**: If the survival ratio after Stage1 is high, read Stage2 
predicate columns for all rows (sequential/index-order reads) to avoid the seek 
overhead caused by many random rowid reads.
   
   ---
   
   ## 2. Usage and Configuration (with SQL Cases)
   
   ### 2.1 Prerequisites
   - Note: This feature works on the **BE storage scan path** (SegmentIterator) 
and is typically controlled by FE session variables forwarded to BE.
   - For stronger validation or observability, you can enable profiling:
     - `set enable_profile=true;`
     - Use `EXPLAIN` / `PROFILE` / `show profile` to inspect metrics.
   
   ### 2.2 Configuration Options
   
   #### 2.2.1 `enable_multi_stage_predicate_lm`
   - Type: bool
   - Default: false
   - Purpose: Enable/disable multi-stage predicate LM.
   - Example:
     - `set enable_multi_stage_predicate_lm = true;`
   
   #### 2.2.2 `predicate_lm_stage1_cols`
   - Type: string
   - Default: empty string (meaning: Stage1 columns are not explicitly 
specified)
   - Purpose: Specify which columns’ predicates should be evaluated in Stage1 
(coarse filtering).
   - Notes:
     - The value is a comma-separated column list string, e.g.:
       - `a`
       - `a,b`
       - ` a ,``b``, a ` (whitespace/backticks/duplicates are allowed)
     - It also supports **scoping to a specific table / db.table** to precisely 
select which scan operator should apply the stage1 column configuration in 
multi-table queries:
       - `table.col`
       - `db.table.col`
     - Important:
       - Here `table` / `db.table` refers to the **real base table name**, not 
a SQL alias.
       - If a scoped token does not match the current scan (e.g., the scan is 
on `t1` but the config includes `t2.a`), that token will be ignored.
       - If a specified column does not exist (or does not exist in the current 
scan table schema), **it will not fail**; the token will be ignored (to avoid 
multi-table queries failing due to schema differences).
     - Default behavior when `predicate_lm_stage1_cols` is empty (or becomes 
empty after ignoring invalid / non-matching tokens):
       - If runtime filter predicate columns are available for the scan, 
SegmentIterator will use those runtime filter columns as Stage1.
       - If there is no runtime filter column available, the implementation 
**falls back to single-stage predicate evaluation** (equivalent to 
`enable_multi_stage_predicate_lm=false` for that scan). In this case, Stage2 
will not happen.
       - If you want multi-stage behavior for queries without runtime filters, 
explicitly configure `predicate_lm_stage1_cols` with at least one valid column 
for the target scan.
   - Examples:
     - `set predicate_lm_stage1_cols = 'a';`
     - `set predicate_lm_stage1_cols = ' a ,``b``, a ';`
     - `set predicate_lm_stage1_cols = 'lineitem.l_shipdate';`
     - `set predicate_lm_stage1_cols = 'tpch.lineitem.l_shipdate';`
   
   #### 2.2.3 `predicate_lm_stage1_survival_ratio_threshold`
   - Type: double
   - Default: 0.8 (storage-side default threshold)
   - Purpose: The Stage1 survival ratio threshold used to decide the Stage2 
strategy:
     - `survival_ratio <= threshold` → prefer `stage2-by-rowids`
     - `survival_ratio > threshold` → prefer `stage2-by-all-rows`
   
   ---
   
   ### 2.3 Recommended Validation (via Profile Metrics)
   In the `SegmentIterator` block of the profile, focus on:
   - `PredicateLMStage1InputRows`: number of input rows to Stage1
   - `PredicateLMStage1OutputRows`: number of output rows from Stage1 (rows 
surviving Stage1 filtering)
   - `PredicateLMStage2ByRowIdsBatches`: number of batches where Stage2 was 
triggered in by-rowids mode
   - `PredicateLMStage2ByAllRowsBatches`: number of batches where Stage2 was 
triggered in by-all-rows mode
   - `PredicateLMStage2RowsRead`: total rows read by Stage2 (semantics differ 
between by-rowids vs by-all-rows)
   
   How to tell whether multi-stage predicate LM is enabled and effective:
   - `PredicateLMStage1InputRows > 0` indicates the scan entered the Stage1 path
   - `PredicateLMStage2ByRowIdsBatches > 0` or 
`PredicateLMStage2ByAllRowsBatches > 0` indicates **Stage2 actually happened**
   - If both Stage2 batch counters stay at 0, it means Stage2 did not happen 
(either because it was not needed, or because the scan fell back to 
single-stage behavior)
   
   ---
   
   ### 2.4 SQL Case Examples (Typical Trigger Paths)
   
   The following examples use table `tbl_multi_stage_predicate_lm(k,a,b)`.
   
   #### Case A: Baseline (Feature Off)
   ```sql
   set enable_multi_stage_predicate_lm = false;
   set predicate_lm_stage1_cols = '';
   
   select count(*) from tbl_multi_stage_predicate_lm where a = 1 and b = 2;
   ```
   
   #### Case B: Feature On + Stage2-by-rowids
   Goal: Stage1 has a low survival ratio, so Stage2 reads late predicate 
columns by rowids.
   
   ```sql
   set enable_profile=true;
   set enable_multi_stage_predicate_lm = true;
   set predicate_lm_stage1_cols = 'a';
   
   -- a=1 is selective, Stage1 survival ratio is low, prefer stage2-by-rowids
   select /* rowids_case */ count(*)
   from tbl_multi_stage_predicate_lm
   where a = 1 and b = 2;
   ```
   
   Expected observation:
   - `PredicateLMStage2ByRowIdsBatches > 0`
   
   #### Case C: Feature On + Stage2-by-all-rows
   Goal: Stage1 has a high survival ratio, so Stage2 reads late predicate 
columns for all rows.
   
   ```sql
   set enable_profile=true;
   set enable_multi_stage_predicate_lm = true;
   set predicate_lm_stage1_cols = 'a';
   
   -- Stage1 has a high survival ratio (e.g. 95%), prefer stage2-by-all-rows
   select /* allrows_case */ count(*)
   from tbl_multi_stage_predicate_lm
   where a < 19 and b = 2;
   ```
   
   Expected observation:
   - `PredicateLMStage2ByAllRowsBatches > 0`
   
   #### Case D: Scoped to Table / DB.Table (Recommended for Multi-Table Queries)
   ```sql
   -- Only applies to column `col` of table `table_name`
   set predicate_lm_stage1_cols = 'table_name.col';
   
   -- Only applies to column `col` of table `db_name.table_name`
   set predicate_lm_stage1_cols = 'db_name.table_name.col';
   ```
   
   #### Case E: Invalid Column / Mismatched Scope Will Not Fail (Ignored)
   Rationale: To avoid multi-table queries failing due to schema differences, 
invalid column names or mismatched scoped tokens will be ignored.
   
   ```sql
   -- Non-existing column: no error; token will be ignored
   set predicate_lm_stage1_cols = 'not_exist';
   
   -- Scoped to another table: no error; token will be ignored
   set predicate_lm_stage1_cols = 'other_table.a';
   
   select count(*) from tbl_multi_stage_predicate_lm where a = 1 and b = 2;
   ```
   
   Note:
   - If the effective Stage1 column list becomes empty after ignoring invalid / 
non-matching tokens, and there is no runtime filter column available, the scan 
will fall back to single-stage predicate evaluation (Stage2 will not happen). 
Use profile metrics to confirm whether Stage2 is triggered.
   
   ---
   
   ## 3. Applicable Scenarios
   
   This feature is best suited for scenarios below (the more conditions are 
met, the more likely you will see gains):
   
   - Many predicate columns (many AND conjuncts), and some of those columns are 
expensive to read/decode
   - You can choose one or more “cheap and highly selective” columns as Stage1 
(e.g., highly selective equality predicates, low-cost types)
   - Wide tables with many non-predicate columns, and lazy materialization is 
enabled/beneficial (filter first, then read non-predicate columns)
   - Queries have meaningful filtering (not a full scan or a high-survival scan 
with little filtering)
   - Runtime filters can serve as fast Stage1 filters (depending on 
implementation strategy)
   
   Recommendations before enabling broadly:
   - Enable `enable_profile=true` and verify that the `SegmentIterator` block 
metrics `PredicateLMStage1*` / `PredicateLMStage2*` are actually hit.
   - Run a one-time A/B profile comparison on critical queries to confirm the 
effect.
   
   Typical example:
   - `WHERE a = const AND (b = const OR c IN (...)) AND d > const ...`
     where `a` is highly selective and cheap, making it a good Stage1 candidate.
   
   ---
   
   ## 4. Risks and Notes
   
   ### 4.1 Performance Regression Risk (Misconfiguration)
   - If you put “low-selectivity / high-cost” columns into Stage1, Stage1 
read/eval cost may increase and offset gains (or even regress).
   - If Stage1 survival ratio stays high but Stage2 is still frequently 
triggered, total read volume may approach or exceed the baseline (especially 
with stage2-by-all-rows).
   
   Recommendations:
   - Prefer **high-selectivity, low read-cost** columns in Stage1.
   - Use A/B profiling on common query patterns before finalizing the stage1 
column strategy.
   
   ### 4.2 Random Read Risk (Misconfiguration)
   - When stage2-by-rowids is chosen, rowid-based reads may increase seeks and 
can be unfriendly under certain column/page layouts.
   - This is why `survival_ratio_threshold` exists: when survival ratio is 
high, prefer all-rows reads to avoid random I/O.
   
   Recommendations:
   - Tune the threshold reasonably and validate with real data distributions.
   
   ### 4.3 Multi-Table Query Considerations (Stage1 Column Scope)
   - If a query scans multiple internal OLAP tables:
     - Using an unqualified column name (e.g. `a`) may affect multiple scans at 
once (if those tables share the same column name).
     - Prefer `table.col` / `db.table.col` to scope the configuration to the 
target table and avoid unintended effects.
   - If a query joins with External Catalog tables:
     - The external scan path does not support multi-stage predicate LM, so 
`predicate_lm_stage1_cols` does not take effect on the external scan.
   
   ### 4.4 Silent Misconfiguration Risk
   - Invalid column names or non-matching scoped tokens are ignored (no error). 
This avoids query failures but may also cause the configuration to have no 
effect.
   - If the effective Stage1 column list becomes empty and there is no runtime 
filter column available, the scan may fall back to single-stage behavior.
   
   Recommendations:
   - Enable profiling and verify Stage1/Stage2 metrics to confirm the feature 
is effective for the target query.
   - For queries without runtime filters, explicitly configure 
`predicate_lm_stage1_cols` with at least one valid column for the target scan.
   
   ---
   
   ## 5. Performance Data
   WIP
   
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to