[I] Critical TPS drop in PG 18 branch caused by matchingsel selectivity for @> operator [age]

via GitHub Thu, 19 Mar 2026 01:28:41 -0700


sandy-bes opened a new issue, #2356:
URL: https://github.com/apache/age/issues/2356


   **Describe the bug**
   A severe performance degradation is observed when executing queries that use 
the `@>` operator. 
   
   **Importantly:** the issue manifests across all workloads (not just `SELECT` 
/ reads, but also during inserts and updates). Any operation that uses `MATCH` 
with property filtering (e.g., `MATCH (n {id: ...}) CREATE ...`) invokes this 
operator under the hood and suffers from this overhead.
   
   Flame graph analysis revealed that the performance drop occurs during the 
query planning phase, which now consumes about 30% of the execution time within 
`agtype_contains`. The root cause lies in the altered definition of the `@>` 
operator (as well as the containment and existence operators) in AGE version 
18. Specifically, the selectivity estimation functions (`RESTRICT` and `JOIN`) 
were replaced. The new `matchingsel` function provides more accurate estimates 
based on statistics, but due to its complex internal logic, it takes 
significantly longer to execute than the old, lightweight `contsel`.
   
   For simple point queries (e.g., finding a node by ID) executed thousands of 
times per second, this "accurate" estimation yields no execution benefit but 
creates a critical parasitic load during query planning.
   
   **How are you accessing AGE (Command line, driver, etc.)?**
   psql (Command line)    
   
   **What data setup do we need to do?**
   To reproduce the issue, I have prepared a set of SQL scripts (attached to 
the report).
   
   1. **Graph creation and population:**
   The `sf` parameter here defines the scale factor of the generated graph.
   
   psql -d your_database -f generate_graph.sql -v sf=1
   
[generate_graph.sql](https://github.com/user-attachments/files/26109641/generate_graph.sql)
   
   
   2. **Creating wrapper functions for the workload:**
   This script sets up the environment and creates functions with the correct 
parameter mapping for Apache AGE.
   
   psql -d your_database -f setup_func_for_workload.sql 
   
[setup_func_for_workload.sql](https://github.com/user-attachments/files/26109616/setup_func_for_workload.sql)
   
   3. **Running the load test:**
   Running 8 clients for 120 seconds.
   
   pgbench -d your_database -f workload_select.sql -D sf=1 -c 8  -T 120 -P 5
   
[workload_select.sql](https://github.com/user-attachments/files/26109624/workload_select.sql)
   
   **What is the necessary configuration info needed?**
   Standard PostgreSQL configuration.
   
   **What is the command that caused the error?**
   In this case, it is not a parsing error, but a planner issue. The slowdown 
occurs on any simple query with property filtering (Cypher translates this to 
the `@>` operator under the hood):
   
   ```pgsql
   EXPLAIN ANALYZE
   SELECT * FROM cypher('snb_graph', $$ 
       MATCH (m:Post {id: 123}) 
       RETURN m.creationDate, coalesce(m.content, m.imageFile, 'No content') 
       UNION ALL 
       MATCH (m:Comment {id: 123}) 
       RETURN m.creationDate, coalesce(m.content, m.imageFile, 'No content')
   $$) AS ( 
       creationDate agtype, 
       content agtype 
   );
   ```
   
   **Expected behavior**
   Planning for simple point queries is expected to be nearly instantaneous, 
without significant overhead. Reverting to the lightweight selectivity 
estimation functions (or optimizing `matchingsel` for cases where gathering 
deep statistics is impractical) should restore performance to the AGE 17 
version level.
   
   **Environment (please complete the following information):**
   * PostgreSQL Version: 18
   * AGE Version: 18
   
   **Additional context**
   In the operator definition, `RESTRICT` specifies the function that estimates 
operator selectivity in a standard `WHERE` clause, while `JOIN` does so for 
join conditions. Replacing `contsel` with `matchingsel` forces complex 
statistics-gathering logic to be invoked during the planning of every minor 
query, which destroys TPS.
   
   Comparison of the `@>` operator DDL across versions:
   
   **AGE Version 17 (Fast planning):**
   ```pgsql
   CREATE OPERATOR @> (
     LEFTARG = agtype,
     RIGHTARG = agtype,
     FUNCTION = ag_catalog.agtype_contains,
     COMMUTATOR = '<@',
     RESTRICT = contsel,     -- Standard, lightweight function with fixed 
selectivity
     JOIN = contjoinsel
   );
   ```
   
   **AGE Version 18 (Planning degradation):**
   ```pgsql
   CREATE OPERATOR @> (
     LEFTARG = agtype,
     RIGHTARG = agtype,
     FUNCTION = ag_catalog.agtype_contains,
     COMMUTATOR = '<@',
     RESTRICT = matchingsel, -- More accurate, but extremely "heavy" function
     JOIN = matchingjoinsel
   );
   ```
   
   **Evidence and Testing:**
   **1. FlameGraph Analysis**
   As seen in the FlameGraphs below, the planning phase in `agtype_contains` 
consumes a massive portion of CPU time:
   
   * Version using `matchingsel`: 
   
   <img width="1525" height="926" alt="Image" 
src="https://github.com/user-attachments/assets/dc7002d2-80a4-4a1d-90df-03fd8f2daa2d";
 />
   
   * Version using `contsel`:
   
   <img width="1525" height="926" alt="Image" 
src="https://github.com/user-attachments/assets/682e4c40-38cc-4d86-b43c-38a7258c8438";
 />
   
   **2. TPS Comparison Charts:**
   The TPS charts below demonstrate the difference between using `matchingsel` 
and `contsel`. Throughput drops critically with the new selectivity estimation:
   
   * Version using `matchingsel`:
   
   <img width="1841" height="839" alt="Image" 
src="https://github.com/user-attachments/assets/522553e6-e849-412f-b08d-d5c57501f15c";
 />
   
   * Version using `contsel`:
   
   <img width="1841" height="839" alt="Image" 
src="https://github.com/user-attachments/assets/8f7706a0-57e9-4bb2-902d-fea546585e14";
 />
   
   **3. Reverting the logic (Proof of Concept)**
   We verified this hypothesis in practice. If `matchingsel` is manually 
replaced back with `contsel` in the 18th version's code, the performance 
degradation completely disappears.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Critical TPS drop in PG 18 branch caused by matchingsel selectivity for @> operator [age]

Reply via email to