avamingli opened a new pull request, #990:
URL: https://github.com/apache/cloudberry/pull/990
Previously, write operations on partitioned tables would alter the
materialized view (MV) status of both ancestor and descendant tables, leading
to unnecessary invalidations of MVs that depended on unaffected partitions.
This commit introduces a more efficient approach to updating MV statuses for
partitioned tables by focusing on leaf partitions that have actually undergone
data changes.
## example:
```sql
create table par(a int, b int, c int) partition by range(b)
subpartition by range(c) subpartition template (start (1) end (3) every
(1))
(start(1) end(3) every(1));
insert into par values(1, 1, 1), (1, 1, 2), (2, 2, 1), (2, 2, 2);
create materialized view mv_par as select * from par;
create materialized view mv_par1 as select * from par_1_prt_1;
create materialized view mv_par1_1 as select * from par_1_prt_1_2_prt_1;
create materialized view mv_par1_2 as select * from par_1_prt_1_2_prt_2;
create materialized view mv_par2 as select * from par_1_prt_2;
create materialized view mv_par2_2 as select * from par_1_prt_2_2_prt2;
```

### INSERT INTO Partitioned Table
```sql
INSERT INTO P1 VALUES (1, 1, 1);
```
#### before this commit:

It is known that data was successfully inserted into P1, which is a
descendant node of P0. Due to the increased data, the status needs to be
propagated upwards to invalidate the materialized view MV_0 based on P0.
At the same time, P1 is a partitioned table, and it also needs to propagate
the status downwards to its child tables. Since it is unclear which child
tables have undergone data changes, all materialized views of the child tables
must be invalidated.
Consequently, the materialized views MV_0, MV_1, MV_1_1, and MV_1_2 are all
invalidated.
#### after this commit:

When data is inserted into a partitioned table, it is ultimately routed to
the child tables at the leaf nodes. The system monitors these leaf nodes for
any data changes and propagates the status upward, while ignoring the changes
directly made to the partition table P0 in the SQL.
The tuple (1, 1, 1) will be routed to the partition P1_1, invalidating the
materialized views of P1_1 and its ancestor partition tables. In contrast, the
other sub-partition P1_2 experiences no data changes, so the materialized view
MV_1_2 remains valid.
As a result, the materialized views MV_0, MV_1, and MV_1_1 are invalidated,
while MV_1_2 remains unaffected.
The new approach allows for status propagation only from the leaf nodes
upward, rather than sending status updates in both directions through the
intermediate nodes of the partition tree. This safeguards unrelated
materialized views from unnecessary refreshes.
### (Split) Across Partition Update on root table
```sql
UPDATE P0 SET c = 2 WHERE b = 2 AND c = 1;
```
#### before this commit:

When the data in the root node is updated, and since it is unclear which
specific child table was updated, the status is propagated downwards to all
descendant child tables.
As a result, all materialized views based on the partition tree are
invalidated: MV_0, MV_1, MV_1_1, MV_1_2, and MV_2_2.
#### after this commit:

Since `c` is the partition key, updating the partition key will cause data
movement **across partitions** in the child tables. The **UPDATE** operation on
the parent table will be translated into an **INSERT** for one child table and
a **DELETE** for another.
The system monitors the changed leaf node child tables: P2_1 and P2_2, and
only propagates their respective statuses upward, invalidating the materialized
views based on their ancestor tables. The materialized views related to the
left subtree P1 remain unaffected.
As a result, MV_0 and MV_2_2 are invalidated, while MV_1, MV_1_1, and MV_1_2
remain valid.
## Extend protocol of libpq
The executor (QE) records the modified partition child tables during the
data modification process. A mechanism is needed to transmit these results to
the QD for updating the metadata. The libpq protocol has been extended to
support this data transfer.
### Processing on the QE and QD sides:

Each QE records the IDs of the child tables it has modified, deduplicates
them in a local bitmap, and sends this information to the QD.
The data changes on each segment node may not be the same. For example, in
the diagram, seg0 has changes in two tables, while seg1 has no changes (as the
data to be updated is not located on the seg1 node due to distribution).
The QD collects multiple bitmaps returned from the segment nodes and
consolidates them into a single bitmap, removing duplicate relids. This
deduplicated bitmap is then used to update the status of the ancestor nodes.
### Extend Protocol Design

Based on libpq, the protocol has been extended with an **'e'** protocol,
which stands for "extend protocol," to unify the handling of information
returns.
Each transmission begins with the **'e'** protocol, followed immediately by
data blocks. A single transmission can carry multiple data blocks.
Each data block corresponds to a specific type of data and is distinguished
by a **subtag**. Following the subtag is a 32-bit integer, **num**, which
indicates the total number of relids in the subsequent array.
The relid array follows the num value. After all data blocks have been
written, the transmission concludes with a **subtag_max**.
The **subtag_max**, which belongs to the same enum category as the
**subtag**, is specifically used to indicate the termination of an extended
protocol transmission.
## Key changes:
- Leaf Partition Tracking: Instead of updating the MV status based on the
parent table, we now detect and track changes at the leaf partition level. This
ensures that only partitions with actual data modifications trigger MV status
updates, significantly reducing unnecessary refreshes.
- Cross-Partition Updates: For operations like UPDATEs that span multiple
partitions (e.g., decomposed into an INSERT on one leaf and a DELETE on
another), the MV status is updated based on the real data state of the affected
leaf partitions, rather than propagating changes through the parent table. This
avoids invalidating MVs that depend on unrelated partitions.
- Executor Enhancements: The executor now detects dynamic partition
expansion during query execution and records data modification bitmaps for each
partition on the QE. These bitmaps are aggregated on the QD to update MV
statuses accurately.
- Protocol Extension: The libpq protocol has been extended to handle QE
feedback uniformly, ensuring that partition-level modification information is
properly collected and consumed by the QD.
This optimization minimizes the impact on MVs by ensuring that only relevant
partitions trigger status updates, improving performance and reducing
unnecessary invalidations.
Authored-by: Zhang Mingli [email protected]
<!-- Thank you for your contribution to Apache Cloudberry (Incubating)! -->
Fixes #ISSUE_Number
### What does this PR do?
<!-- Brief overview of the changes, including any major features or fixes -->
### Type of Change
- [ ] Bug fix (non-breaking change)
- [ ] New feature (non-breaking change)
- [ ] Breaking change (fix or feature with breaking changes)
- [ ] Documentation update
### Breaking Changes
<!-- Remove if not applicable. If yes, explain impact and migration path -->
### Test Plan
<!-- How did you test these changes? -->
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Passed `make installcheck`
- [ ] Passed `make -C src/test installcheck-cbdb-parallel`
### Impact
<!-- Remove sections that don't apply -->
**Performance:**
<!-- Any performance implications? -->
**User-facing changes:**
<!-- Any changes visible to users? -->
**Dependencies:**
<!-- New dependencies or version changes? -->
### Checklist
- [ ] Followed [contribution
guide](https://cloudberry.apache.org/contribute/code)
- [ ] Added/updated documentation
- [ ] Reviewed code for security implications
- [ ] Requested review from [cloudberry
committers](https://github.com/orgs/apache/teams/cloudberry-committers)
### Additional Context
<!-- Any other information that would help reviewers? Remove if none -->
### CI Skip Instructions
<!--
To skip CI builds, add the appropriate CI skip identifier to your PR title.
The identifier must:
- Be in square brackets []
- Include the word "ci" and either "skip" or "no"
- Only use for documentation-only changes or when absolutely necessary
-->
---
<!-- Join our community:
- Mailing list:
[[email protected]](https://lists.apache.org/[email protected])
(subscribe: [email protected])
- Discussions: https://github.com/apache/cloudberry/discussions -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]