sandy-bes opened a new issue, #2356:
URL: https://github.com/apache/age/issues/2356
**Describe the bug**
A severe performance degradation is observed when executing queries that use
the `@>` operator.
**Importantly:** the issue manifests across all workloads (not just `SELECT`
/ reads, but also during inserts and updates). Any operation that uses `MATCH`
with property filtering (e.g., `MATCH (n {id: ...}) CREATE ...`) invokes this
operator under the hood and suffers from this overhead.
Flame graph analysis revealed that the performance drop occurs during the
query planning phase, which now consumes about 30% of the execution time within
`agtype_contains`. The root cause lies in the altered definition of the `@>`
operator (as well as the containment and existence operators) in AGE version
18. Specifically, the selectivity estimation functions (`RESTRICT` and `JOIN`)
were replaced. The new `matchingsel` function provides more accurate estimates
based on statistics, but due to its complex internal logic, it takes
significantly longer to execute than the old, lightweight `contsel`.
For simple point queries (e.g., finding a node by ID) executed thousands of
times per second, this "accurate" estimation yields no execution benefit but
creates a critical parasitic load during query planning.
**How are you accessing AGE (Command line, driver, etc.)?**
psql (Command line)
**What data setup do we need to do?**
To reproduce the issue, I have prepared a set of SQL scripts (attached to
the report).
1. **Graph creation and population:**
The `sf` parameter here defines the scale factor of the generated graph.
psql -d your_database -f generate_graph.sql -v sf=1
[generate_graph.sql](https://github.com/user-attachments/files/26109641/generate_graph.sql)
2. **Creating wrapper functions for the workload:**
This script sets up the environment and creates functions with the correct
parameter mapping for Apache AGE.
psql -d your_database -f setup_func_for_workload.sql
[setup_func_for_workload.sql](https://github.com/user-attachments/files/26109616/setup_func_for_workload.sql)
3. **Running the load test:**
Running 8 clients for 120 seconds.
pgbench -d your_database -f workload_select.sql -D sf=1 -c 8 -T 120 -P 5
[workload_select.sql](https://github.com/user-attachments/files/26109624/workload_select.sql)
**What is the necessary configuration info needed?**
Standard PostgreSQL configuration.
**What is the command that caused the error?**
In this case, it is not a parsing error, but a planner issue. The slowdown
occurs on any simple query with property filtering (Cypher translates this to
the `@>` operator under the hood):
```pgsql
EXPLAIN ANALYZE
SELECT * FROM cypher('snb_graph', $$
MATCH (m:Post {id: 123})
RETURN m.creationDate, coalesce(m.content, m.imageFile, 'No content')
UNION ALL
MATCH (m:Comment {id: 123})
RETURN m.creationDate, coalesce(m.content, m.imageFile, 'No content')
$$) AS (
creationDate agtype,
content agtype
);
```
**Expected behavior**
Planning for simple point queries is expected to be nearly instantaneous,
without significant overhead. Reverting to the lightweight selectivity
estimation functions (or optimizing `matchingsel` for cases where gathering
deep statistics is impractical) should restore performance to the AGE 17
version level.
**Environment (please complete the following information):**
* PostgreSQL Version: 18
* AGE Version: 18
**Additional context**
In the operator definition, `RESTRICT` specifies the function that estimates
operator selectivity in a standard `WHERE` clause, while `JOIN` does so for
join conditions. Replacing `contsel` with `matchingsel` forces complex
statistics-gathering logic to be invoked during the planning of every minor
query, which destroys TPS.
Comparison of the `@>` operator DDL across versions:
**AGE Version 17 (Fast planning):**
```pgsql
CREATE OPERATOR @> (
LEFTARG = agtype,
RIGHTARG = agtype,
FUNCTION = ag_catalog.agtype_contains,
COMMUTATOR = '<@',
RESTRICT = contsel, -- Standard, lightweight function with fixed
selectivity
JOIN = contjoinsel
);
```
**AGE Version 18 (Planning degradation):**
```pgsql
CREATE OPERATOR @> (
LEFTARG = agtype,
RIGHTARG = agtype,
FUNCTION = ag_catalog.agtype_contains,
COMMUTATOR = '<@',
RESTRICT = matchingsel, -- More accurate, but extremely "heavy" function
JOIN = matchingjoinsel
);
```
**Evidence and Testing:**
**1. FlameGraph Analysis**
As seen in the FlameGraphs below, the planning phase in `agtype_contains`
consumes a massive portion of CPU time:
* Version using `matchingsel`:
<img width="1525" height="926" alt="Image"
src="https://github.com/user-attachments/assets/dc7002d2-80a4-4a1d-90df-03fd8f2daa2d"
/>
* Version using `contsel`:
<img width="1525" height="926" alt="Image"
src="https://github.com/user-attachments/assets/682e4c40-38cc-4d86-b43c-38a7258c8438"
/>
**2. TPS Comparison Charts:**
The TPS charts below demonstrate the difference between using `matchingsel`
and `contsel`. Throughput drops critically with the new selectivity estimation:
* Version using `matchingsel`:
<img width="1841" height="839" alt="Image"
src="https://github.com/user-attachments/assets/522553e6-e849-412f-b08d-d5c57501f15c"
/>
* Version using `contsel`:
<img width="1841" height="839" alt="Image"
src="https://github.com/user-attachments/assets/8f7706a0-57e9-4bb2-902d-fea546585e14"
/>
**3. Reverting the logic (Proof of Concept)**
We verified this hypothesis in practice. If `matchingsel` is manually
replaced back with `contsel` in the 18th version's code, the performance
degradation completely disappears.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]