edespino opened a new issue, #1395:
URL: https://github.com/apache/cloudberry/issues/1395
### Apache Cloudberry version
PostgreSQL 14.4 (Apache Cloudberry 3.0.0-devel+dev.2138.g37fce691d3d build
dev)
### What happened
**Report for:** Apache Cloudberry (Incubating) Development Community
**Date:** October 15, 2025
**Severity:** HIGH
**Category:** Memory Management / PostGIS Integration
**Status:** Reproducible (100% consistent)
---
## Executive Summary
PostGIS 3.3.2 exhibits critical memory corruption when executing distributed
queries with spatial predicates in Apache Cloudberry Database. The issue
manifests as a `MemoryContextContains` assertion failure at `mcxt.c:933` during
cross-segment geometry operations.
**Discovery Context:** This issue was discovered during an attempt to
validate the PostGIS 3.3.2 release by executing its embedded regression test
suite against Apache Cloudberry Database. The test suite revealed consistent
crashes when distributed queries involve geometry operations with spatial
predicates.
**Root Cause:** PostGIS geometry cache (`shared_gserialized_ref`) assumes
single-process memory management and fails when geometries cross segment
boundaries via motion nodes in distributed queries.
**Impact:**
- ❌ **CRASH:** Cross-segment joins with spatial predicates (ST_Contains,
ST_Intersection, ST_Intersects)
- ❌ **CRASH:** Large TOAST geometries in distributed queries
- ✅ **WORKS:** Single-segment queries (all operations work correctly)
- ✅ **WORKS:** Non-distributed geometry operations
---
## Environment Details
### System Configuration
```
OS: Rocky Linux 9.6 (5.14.0-570.17.1.el9_6.x86_64)
Compiler: gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5)
Architecture: x86_64
```
### Software Versions
```
Database: PostgreSQL 14.4 (Apache Cloudberry
3.0.0-devel+dev.2138.g37fce691d3d build dev)
Build Type: Debug build with assertions enabled (compiled Oct 14 2025
13:56:48)
PostGIS: 3.3.2 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
GEOS: 3.11.0-CAPI-1.17.0
PROJ: 6.0.0 (March 1st, 2019)
GDAL: 3.5.3
```
### Cluster Configuration
```
Topology: Standard demo cluster (1 coordinator + 3 segment servers)
Port: 7000 (coordinator), 7002-7004 (segments)
Created via: make create-demo-cluster
```
---
## Reproduction Steps
### Prerequisites
1. **Build Cloudberry Database with assertions:**
```bash
git clone https://github.com/apache/cloudberry.git
cd cloudberry
./configure --enable-cassert --enable-debug --with-python
--prefix=/usr/local/cloudberry
make -j$(nproc)
make install
```
2. **Install PostGIS and dependencies:**
```bash
# Install geospatial dependency stack
# CGAL 5.6.1 → SFCGAL 1.4.1 → GEOS 3.11.0 → PROJ 6.0.0 → GDAL 3.5.3 →
PostGIS 3.3.2
# Or use your preferred PostGIS installation method
# Ensure PostGIS is built against Cloudberry's pg_config
```
3. **Create demo cluster:**
```bash
cd gpAux/gpdemo
make create-demo-cluster
source gpdemo-env.sh
```
### Test Case 1: Cross-Segment ST_Contains (Crashes)
```sql
-- Connect to database
psql -p 7000 -d postgres
-- Create test database
DROP DATABASE IF EXISTS distributed_crash_test;
CREATE DATABASE distributed_crash_test;
\c distributed_crash_test
CREATE EXTENSION postgis;
-- Create distributed table with geometries
CREATE TABLE distributed_geoms (
id SERIAL,
name TEXT,
geom GEOMETRY(POINT, 4326)
) DISTRIBUTED BY (id);
-- Insert data across segments
INSERT INTO distributed_geoms (name, geom)
SELECT
'point_' || n,
ST_GeomFromText('POINT(' || (n % 180 - 90)::text || ' ' || (n % 90 -
45)::text || ')', 4326)
FROM generate_series(1, 100) n;
-- Execute cross-segment join with ST_Contains
-- THIS WILL CRASH THE DATABASE SEGMENT
SELECT
a.id,
COUNT(b.id) as contained_points
FROM distributed_geoms a, distributed_geoms b
WHERE a.id <= 5 AND ST_Contains(ST_Buffer(a.geom, 10), b.geom)
GROUP BY a.id
ORDER BY a.id;
```
**Expected Result:** Database segment crashes with:
```
ERROR: Unexpected internal error (assert.c:48) (seg0 slice1 10.0.1.15:7002
pid=XXXXX)
DETAIL: FailedAssertion("false", File: "mcxt.c", Line: 933)
```
### Test Case 2: TOAST Geometries with ST_Intersection (Crashes)
```sql
-- Create table with large TOAST geometries
CREATE TABLE toast_geoms (
id SERIAL,
large_geom GEOMETRY
) DISTRIBUTED BY (id);
-- Insert large polygons (will be TOASTed due to size)
INSERT INTO toast_geoms (large_geom)
SELECT
ST_Buffer(
ST_GeomFromText('POINT(' || n::text || ' ' || n::text || ')', 4326),
5.0,
100 -- 100 segments = large geometry that gets TOASTed
)
FROM generate_series(1, 20) n;
-- Execute cross-segment intersection
-- THIS WILL CRASH THE DATABASE SEGMENT
SELECT
a.id,
ST_Area(ST_Intersection(a.large_geom, b.large_geom)) as intersection_area
FROM toast_geoms a, toast_geoms b
WHERE a.id < b.id
AND a.id <= 5
AND b.id <= 10
AND ST_Intersects(a.large_geom, b.large_geom)
LIMIT 10;
```
**Expected Result:** Database segment crashes with same `mcxt.c:933`
assertion failure.
### Test Case 3: Single-Segment Query (Works)
```sql
-- This query works because all data is on one segment
CREATE TABLE replicated_geoms (
id INT,
geom GEOMETRY(POINT, 4326)
) DISTRIBUTED REPLICATED;
INSERT INTO replicated_geoms (id, geom)
SELECT
n,
ST_GeomFromText('POINT(' || n::text || ' ' || n::text || ')', 4326)
FROM generate_series(1, 100) n;
-- This works correctly - no crash
SELECT
a.id,
COUNT(b.id) as contained_points
FROM replicated_geoms a, replicated_geoms b
WHERE a.id <= 5 AND ST_Contains(ST_Buffer(a.geom, 10), b.geom)
GROUP BY a.id
ORDER BY a.id;
```
**Expected Result:** Query completes successfully (no crash).
---
## Core Dump Analysis
### Crash Location
```
Program terminated with signal SIGABRT, Aborted.
Core was generated by `postgres: 7002, cbadmin distributed_crash_test
10.0.1.15(51960) con93 seg0 cmd'.
```
### Complete Stack Trace (With Enhanced Debug Symbols)
```gdb
Thread 1 (Thread 0x7f3efbecb8c0 (LWP 211661)):
#0 0x00007f3efe28bedc in __pthread_kill_implementation () from
/lib64/libc.so.6
#1 0x00007f3efe23eb46 in raise () from /lib64/libc.so.6
#2 0x00007f3efe228833 in abort () from /lib64/libc.so.6
#3 0x00000000011299ae in ExceptionalCondition (
conditionName=0x1893705 "false",
errorType=0x18933a9 "FailedAssertion",
fileName=0x1893420 "mcxt.c",
lineNumber=933) at assert.c:48
__func__ = "ExceptionalCondition"
#4 0x000000000118098e in MemoryContextContains (
context=0x86e4820,
pointer=0x8407e40) at mcxt.c:933
ptr_context = <optimized out>
#5 0x00007f3ee583d031 in shared_gserialized_ref (
fcinfo=0x86e7530,
ref=0x8407e40) at shared_gserialized.c:50
# No locals visible (assertion before locals set)
#6 0x00007f3ee583b2e2 in GetGeomCache (
fcinfo=0x86e7530,
cache_methods=0x7f3ee591b220 <PrepGeomCacheMethods>,
g1=0x8407e40, # ← First geometry pointer (from motion node)
g2=0x8407e68) at lwgeom_cache.c:184 # ← Second geometry pointer
# LOCAL VARIABLES WITH DEBUG SYMBOLS:
cache = 0x86a0290 # Geometry cache structure
cache_hit = 0 # Cache miss (0 = miss, 1 = hit)
old_context = 0x8407d00 # Previous memory context
geom = 0x7f3ee587a401 # Current geometry being cached
generic_cache = 0x86e7588 # Generic cache entry
entry_number = 1 # Cache slot number
#7 0x00007f3ee579b19f in GetPrepGeomCache (
fcinfo=0x86e7530,
g1=0x8407e40,
g2=0x8407e68) at lwgeom_geos_prepared.c:372
#8 0x00007f3ee57972e0 in ST_Intersects (
fcinfo=0x86e7530) at lwgeom_geos.c:2296
# GEOMETRY POINTERS:
shared_geom1 = 0x8407e40 # First geometry (from segment A)
shared_geom2 = 0x8407e68 # Second geometry (from segment B)
geom1 = 0x7f3ee7514620 # Deserialized geometry 1
geom2 = 0x869e230 # Deserialized geometry 2
result = 0 # Intersection result (not computed)
# BOUNDING BOX DETAILS (Enhanced Debug Info):
box1 = {
flags = 4, # Geometry flags
xmin = -4, # Minimum X coordinate
xmax = 6, # Maximum X coordinate
ymin = -4, # Minimum Y coordinate
ymax = 6, # Maximum Y coordinate
zmin = 6.988574765777616e-316, # Z coordinates (2D geometry)
zmax = 6.9885656749697325e-316,
mmin = 6.9740780892236004e-316, # M coordinates (not used)
mmax = 6.9740780398170359e-316
}
box2 = {
flags = 4,
xmin = 0, # Geometry 2 extent
xmax = 10,
ymin = 0,
ymax = 10,
zmin = 9.0686371816874563e-317,
zmax = 4.3123702040745705e-314,
mmin = 6.9123997658947312e-310,
mmax = 6.9123998372278303e-310
}
prep_cache = 0xc03e8e # Prepared geometry cache entry
__func__ = "ST_Intersects"
#9 0x0000000000bdedc2 in ExecInterpExpr (
state=0x86978a8,
econtext=0x86e5370,
isnull=0x7ffe09dcc667) at execExprInterp.c:760
fcinfo = 0x86e7530 # Function call info
args = 0x86e7550 # Function arguments
nargs = 2 # Number of arguments
d = 1
op = 0x86e6ed0 # Operation code
resultslot = 0x0
innerslot = 0x86e5c58 # Inner tuple slot (segment B data)
outerslot = 0x86e58e8 # Outer tuple slot (segment A data)
scanslot = 0x0
#10 0x0000000000c4b7cc in ExecEvalExprSwitchContext (
state=0x86978a8,
econtext=0x86e5370,
isNull=0x7ffe09dcc667) at executor.h:382
retDatum = 0
oldContext = 0x86e4820 # Previous context before switch
#11 0x0000000000c4b8d6 in ExecQual (
state=0x86978a8,
econtext=0x86e5370) at executor.h:451
ret = 141124264
isnull = false
#12 0x0000000000c4c02c in ExecNestLoop_guts (
pstate=0x86e5158) at nodeNestloop.c:337
node = 0x86e5158 # Nested loop join node
nl = 0x84dfe50 # Nested loop state
innerPlan = 0x86962a8 # Inner plan (segment B)
outerPlan = 0x86e5460 # Outer plan (segment A)
outerTupleSlot = 0x86e58e8 # Current outer tuple
innerTupleSlot = 0x86e5c58 # Current inner tuple
joinqual = 0x86978a8 # Join qualification (ST_Intersects)
otherqual = 0x0
econtext = 0x86e5370 # Expression context
...
#22 0x0000000000c6c5d9 in execMotionSender (
node=0x86e4ba0) at nodeMotion.c:230
outerTupleSlot = 0x30203020312d2020 # Tuple being sent
outerNode = 0x86e4ea8 # Source node
motion = 0x84df800 # Motion node structure
done = false
__func__ = "execMotionSender"
^^^ MOTION NODE - GEOMETRIES CROSSING SEGMENT BOUNDARY ^^^
^^^ THIS IS WHERE MEMORY CONTEXT DIVERGENCE OCCURS ^^^
#23 0x0000000000c6c482 in ExecMotion (
pstate=0x86e4ba0) at nodeMotion.c:193
node = 0x86e4ba0
motion = 0x84df800
__func__ = "ExecMotion"
...
#27 0x0000000000befcc9 in ExecutePlan (
estate=0x86e4960,
planstate=0x86e4ba0,
use_parallel_mode=false,
operation=CMD_SELECT,
sendTuples=false,
numberTuples=0,
direction=ForwardScanDirection,
dest=0x853efe8,
execute_once=true) at execMain.c:2772
```
**Key Insights from Enhanced Debug Symbols:**
1. **Memory Context Mismatch:** `context=0x86e4820` vs `pointer=0x8407e40`
- Geometry allocated in different context than expected
2. **Cache State:** `cache_hit = 0` indicates cache miss, triggering
validation
3. **Geometry Coordinates:** Bounding boxes show actual geometry extents:
- Geometry 1: `(-4, -4)` to `(6, 6)` - buffered geometry from segment A
- Geometry 2: `(0, 0)` to `(10, 10)` - original geometry from segment B
4. **Nested Loop Join:** Inner/outer tuple slots point to data from
different segments
5. **Motion Node:** Frame #22 shows where geometries cross segment boundaries
### Critical Code Path
The crash occurs when:
1. **Query coordinator** creates prepared geometry cache with
`shared_gserialized_ref`
2. **Motion node** serializes geometry and sends to segment servers
3. **Segment server** deserializes geometry into its own memory context
4. **PostGIS cache** tries to validate memory context with
`MemoryContextContains`
5. **Assertion fails** because pointer is not in expected context
### Memory Context Information
From `shared_gserialized.c:50`:
```c
static inline GSERIALIZED *
shared_gserialized_ref(FunctionCallInfo fcinfo, GSERIALIZED *ref)
{
MemoryContext context = fcinfo->flinfo->fn_mcxt;
// THIS ASSERTION FAILS IN DISTRIBUTED QUERIES:
Assert(MemoryContextContains(context, ref));
return ref;
}
```
The assertion at `mcxt.c:933` is:
```c
bool
MemoryContextContains(MemoryContext context, void *pointer)
{
MemoryContext ptr_context = GetMemoryChunkContext(pointer);
// FAILS: pointer's context != expected context
Assert(ptr_context == context);
...
}
```
### Register State at Crash
```
rax 0x0 0
rbx 0x7f8f4d6828c0 140253455722688
rcx 0x7f8f4fa8bedc 140253493509852
rdx 0x6 6 (SIGABRT)
rsi 0xcbdc 52188 (PID)
rdi 0xcbdc 52188
rip 0x7f8f4fa8bedc __pthread_kill_implementation+284
```
### Loaded PostGIS Libraries
```
/usr/local/cloudberry/lib/postgresql/postgis_raster-3.so
/usr/local/geos-3.11.0/lib64/libgeos.so.3.11.0
/usr/local/gdal-3.5.3/lib/libgdal.so.31.0.3
```
---
## Root Cause Analysis
### Technical Deep Dive
PostGIS geometry cache implementation makes assumptions about memory context
that are violated in distributed database architectures:
**Single-Process Assumption (PostgreSQL/standalone):**
```
┌─────────────────────┐
│ Backend Process │
│ │
│ ┌──────────────┐ │
│ │Memory Context│ │
│ │ │ │
│ │ Geometry ────┼───┼──> Cache validates: OK ✓
│ │ Cache │ │
│ └──────────────┘ │
└─────────────────────┘
```
**Distributed Reality (Cloudberry/Greenplum):**
```
┌─────────────────────┐ Motion Node ┌─────────────────────┐
│ Coordinator │ (Serialize) │ Segment Server │
│ │ ──────────────────────> │ │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │Context A │ │ │ │Context B │ │
│ │ │ │ │ │ │ │
│ │ Geometry ────┼───┼─> Sent to segment │ │ Geometry ────┼───┼─>
Cache validates: FAIL ✗
│ │ │ │ │ │ (different │ │
(pointer not in Context B)
│ └──────────────┘ │ │ │ context) │ │
└─────────────────────┘ │ └──────────────┘ │
└─────────────────────┘
```
### Why Single-Segment Queries Work
When using `DISTRIBUTED REPLICATED` tables or single-segment operations:
- All data resides on one segment
- No motion nodes involved
- No cross-context geometry references
- Cache validation succeeds
### Why Cross-Segment Queries Crash
When executing distributed joins:
- Geometries cross segment boundaries via motion nodes
- Memory contexts differ between coordinator and segments
- PostGIS cache assumes single-process memory model
- `MemoryContextContains` assertion fails
---
## Is This a Real Issue or Just an Assertion Artifact?
**This is a genuine memory safety bug**, not an assertion false positive.
**What assertions do:** The `--enable-cassert` flag enables developer safety
checks that validate assumptions in the code. The failing assertion
`MemoryContextContains(context, ref)` detects that PostGIS is using a pointer
from the wrong memory context - a violation that "should never happen"
according to PostgreSQL's memory management design.
**Without assertions (production builds):**
- No immediate crash at mcxt.c:933, but the underlying bug still exists
- Leads to **use-after-free** when memory contexts are destroyed
- Results in **intermittent crashes** at unpredictable locations
- Potential **silent data corruption** when accessing freed/recycled memory
- **Memory leaks** when cleanup code can't find objects in expected contexts
- Much harder to diagnose - random crashes instead of clear error messages
**Why this is serious:** PostgreSQL's memory context system is fundamental
to safe memory management. Each context owns its allocated memory, and
destroying a context frees everything in it atomically. Using pointers across
context boundaries violates memory ownership guarantees and can cause access to
freed memory - equivalent to classic use-after-free bugs in C.
**Real-world analogy:**
- Using a pointer after `free()` in C
- Accessing an object after its destructor runs in C++
- Using memory from a different thread without synchronization
The assertion is doing its job: **catching a serious architectural
incompatibility early, at the point of violation, before it causes
unpredictable failures in production**.
---
## Impact Assessment
### Affected Operations
| Operation | Single-Segment | Cross-Segment | Notes |
|-----------|----------------|---------------|-------|
| ST_Contains | ✅ Works | ❌ **CRASHES** | Prepared geometry cache |
| ST_Intersection | ✅ Works | ❌ **CRASHES** | TOAST + motion nodes |
| ST_Intersects | ✅ Works | ❌ **CRASHES** | Used in WHERE clauses |
| ST_Within | ✅ Works | ❌ **CRASHES** | Prepared predicates |
| ST_DWithin | ✅ Works | ❌ **CRASHES** | Distance predicates |
| ST_Buffer | ✅ Works | ✅ Works | No cache involved |
| ST_AsText | ✅ Works | ✅ Works | Simple conversion |
| ST_Area | ✅ Works | ✅ Works | Measurement function |
| ST_Union (agg) | ✅ Works | ⚠️ **May crash** | If crosses segments |
### Use Cases Affected
**❌ HIGH IMPACT:**
- Spatial analytics requiring cross-segment joins
- Large-scale geospatial data warehousing
- Distributed spatial queries with predicates
- Production applications using distributed geometry tables
**✅ LOW IMPACT:**
- Single-table queries on replicated geometry data
- Simple geometry transformations without joins
- Point-in-time geometry operations
- Non-distributed PostGIS operations
### Production Recommendations
**DO NOT USE in production:**
- Distributed tables with geometry columns + cross-segment joins
- Spatial predicates (ST_Contains, ST_Intersects) in WHERE clauses with
distributed data
- Large TOAST geometries in distributed queries
**SAFE to use:**
- DISTRIBUTED REPLICATED tables for small geometry reference data
- Single-segment geometry operations
- Simple geometry functions without prepared caches
- Geometry operations in application layer (not database)
---
## Test Files for Reproduction
Complete test suite available at:
### Basic Validation Test
**File:** `postgis-basic-validation-test.sql`
**Purpose:** Validates basic PostGIS operations work correctly
**Result:** ✅ All tests pass - **NO CRASHES**
### Intensive Stress Test
**File:** `postgis-intensive-raster-test.sql`
**Purpose:** 85+ sequential raster operations to test state accumulation
**Result:** ✅ All tests pass - **NO CRASHES** (no distributed operations)
### Distributed Crash Test (ONLY FILE THAT CRASHES)
**File:** `postgis-distributed-crash-test.sql`
**Purpose:** **Reproduces memory corruption in distributed queries**
**Result:** ❌ **CRASHES on Tests 5 and 7** (expected)
**⚠️ IMPORTANT:** This is the **ONLY test file that reproduces the crash**.
The crash only occurs with distributed queries involving cross-segment geometry
operations.
**Test Series Breakdown:**
- Tests 1-4: Basic distributed operations ✅ PASS
- **Test 5:** ST_Contains with cross-segment join ❌ **CRASHES** (mcxt.c:933)
- Test 6: Complex mixed operations ✅ PASS
- **Test 7:** ST_Intersection with TOAST geometries ❌ **CRASHES**
(mcxt.c:933)
- Tests 8-10: Other distributed scenarios ✅ PASS
### Running the Test Suite
```bash
# Basic validation - DOES NOT CRASH
psql postgres -f postgis-basic-validation-test.sql
# Intensive stress test - DOES NOT CRASH
psql postgres -f postgis-intensive-raster-test.sql
# Distributed crash reproduction - CRASHES ON TESTS 5 AND 7
psql postgres -f postgis-distributed-crash-test.sql
```
Test files attached to this issue.
---
## Automated Crash Detection
The test framework includes automated crash pattern detection with optional
enhanced debug builds:
```bash
# Standard crash test
./assemble.sh --run --component postgis --steps crash-test
# With enhanced debug symbols (recommended for detailed analysis)
./assemble.sh --run --component postgis \
--steps configure,build,install,crash-test \
--debug
```
**Detected Patterns:**
- `[MEMORY-CONTEXT-CORRUPTION]` - mcxt.c:933 assertion failure
- `[GEOMETRY-CACHE]` - shared_gserialized_ref corruption
- `[DISTRIBUTED-GEOMETRY]` - ST_Contains, ST_Intersection crashes
- `[MOTION-NODE]` - Cross-segment data movement
**Outputs:**
- Core dump: `/var/crash/core-postgres-*` (420MB with debug symbols)
- GDB analysis: `postgis-crash-analysis-*.txt` (with local variables and
geometry details)
- Summary report: `postgis-crash-summary-*.txt`
**Debug Build Benefits in Core Dumps:**
- Visible geometry bounding boxes with coordinates
- Memory context pointers and relationships
- Cache hit/miss state and entry numbers
- Local variable values in all functions
- Complete parameter lists with types
---
## Contact & Support
**Tested by:** Assembly BOM Development Team
**Test Framework:** https://github.com/cloudberrydb/assembly-bom
**Cloudberry Version:** 3.0.0-devel+dev.2141.g468b1e67dc8
**PostGIS Version:** 3.3.2
**Core Dump Available:** Yes (420MB, full GDB analysis with enhanced debug
symbols)
**Debug Build:** Yes (compiled with -O0 -g3 -ggdb3 for maximum debug
information)
**Reproduction Rate:** 100% consistent (6 core dumps generated in single
test run)
**Environments Tested:** Rocky Linux 9.6, x86_64
**Enhanced Debug Features:**
- ✅ Local variable visibility in all stack frames
- ✅ Geometry bounding box details (xmin, xmax, ymin, ymax)
- ✅ Memory context addresses and relationships
- ✅ Cache state inspection (hit/miss rates, entry numbers)
- ✅ Function parameter values with full type information
### What you think should happen instead
_No response_
### How to reproduce
`psql postgres -f postgis-distributed-crash-test.sql`
### Operating System
Rocky Linux 9.6 (5.14.0-570.17.1.el9_6.x86_64)
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes, I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/cloudberry/blob/main/CODE_OF_CONDUCT.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]