gregfelice opened a new issue, #2381:
URL: https://github.com/apache/age/issues/2381
## Summary
Add openCypher `FOREACH` clause support to AGE. `FOREACH` is a common ETL /
iterative-update construct and is one of the few remaining Phase 1 Cypher
parity gaps — along with pattern expressions in WHERE (#1577, PR #2360),
predicate functions (PR #2359), and `MERGE ON CREATE/MATCH SET` (PR #2347).
## Cypher semantics (openCypher / Neo4j)
```
FOREACH (var IN list-expression | update-clause [update-clause ...])
```
- Body may contain **only update clauses**: `CREATE`, `MERGE`, `SET`,
`REMOVE`, `DELETE`, and nested `FOREACH`.
- Body runs once per list element; binds `var` to the current element in
scope for body clauses only.
- Produces no new rows in the outer query — the outer row set passes through
unchanged.
- No read clauses (`MATCH`, `WITH`, `RETURN`) inside the body.
- Empty or NULL list → no-op, outer rows preserved.
Examples:
```cypher
// Create nodes from a list
FOREACH (name IN ['Alice','Bob','Carol'] | CREATE (:Person {name: name}))
// Per-row iterative update
MATCH (p:Person)
FOREACH (tag IN p.tags | SET p.tag_count = p.tag_count + 1)
// Idempotent tag creation
MATCH (p:Person)
FOREACH (t IN p.tag_names | MERGE (tag:Tag {name: t}) MERGE
(p)-[:HAS_TAG]->(tag))
```
## Why FOREACH is not UNWIND
`UNWIND` flattens a list into the row stream — every element becomes an
outer row. `FOREACH` is the opposite: its body runs side-effecting update
clauses per element, but the outer row set is unchanged. You can sometimes
rewrite one as the other, but not when there are projections downstream that
must not be multiplied by list length.
## Existing AGE infrastructure that can be reused
- `cypher_unwind` node + `transform_cypher_unwind`
(`src/backend/parser/cypher_clause.c:1440`) — list iteration, element-variable
binding, `UNWIND expr AS var` grammar shape.
- `transform_cypher_set_item_list`
(`src/backend/parser/cypher_clause.c:1862`) — per-item update list transform,
already parameterized via `cypher_update_item`.
- Existing CustomScan executor nodes for `cypher_create`, `cypher_set`,
`cypher_delete`, `cypher_merge` — these are exactly the body clauses `FOREACH`
needs to invoke per iteration.
## Proposed implementation strategy
Two viable paths; happy to take maintainer input before writing code.
**Option A — New `cypher_foreach` CustomScan node (preferred).** Analogous
to `cypher_create`. Holds (a) the list expression, (b) pre-built child
update-clause plans, (c) a per-element tuple slot. Per outer tuple: iterate the
list, bind `var`, invoke each child's executor in sequence; no tuples emitted.
This matches AGE's existing architecture for write clauses and gives clean
semantics (body runs, outer row passes through).
**Option B — Lower to side-effecting SubPlan.** Transform `FOREACH (x IN
list | body)` into a correlated SubPlan that UNWINDs `list` and runs body
clauses, attached as an init node to the outer query so it runs per outer row
but discards its output. Less new code but harder to reason about
row-preservation guarantees.
Option A is probably the path that fits AGE best.
## Sketch of the code changes
**Grammar (`cypher_gram.y`)**
- New `foreach` non-terminal mirroring `unwind` (line ~974).
- New parse node `cypher_foreach` mirroring `cypher_unwind` with fields:
`target_name`, `expr`, `body_clauses`.
- Register in the `clause` alternation and the transform dispatch in
`transform_cypher_clause` (`cypher_clause.c:504`).
- Reject non-update body clauses at parse time with a location-bearing error.
**Transform (`transform_cypher_foreach`)**
- Push a parse scope with `var` bound to the element type.
- Recursively transform each body clause — each becomes its own Query,
chained as children of the `cypher_foreach` node.
- Validate body is update-only (`cypher_create` / `cypher_set` /
`cypher_delete` / `cypher_merge` / nested `cypher_foreach`).
**Executor**
- New `src/backend/executor/cypher_foreach.c` analogous to `cypher_create.c`.
- `ExecCypherForeach` iterates the evaluated list, sets the `ecxt_scantuple`
element slot, calls each child update executor in sequence, performs
per-iteration cleanup, and emits no tuples — the outer tuple passes through.
**Regression tests (`regress/sql/cypher_foreach.sql`)**
- Smoke: `FOREACH (x IN [1,2,3] | CREATE (:N {v: x}))` → count check.
- Nested SET: `MATCH (n:Person) FOREACH (tag IN n.tags | SET n.tag_count =
n.tag_count + 1)`.
- MERGE inside FOREACH: idempotent tag creation pattern.
- Nested FOREACH.
- Reject reads: `FOREACH (x IN list | MATCH ...)` → parse error with
location.
- Empty list: no-op, outer rows preserved.
- NULL list: treat as empty (Neo4j semantics).
## Open questions for maintainers
1. Preference on Option A vs Option B above?
2. Any concerns about adding a new CustomScan node in
`src/backend/executor/` vs slotting into an existing file?
3. Should `RETURN`-inside-FOREACH produce a dedicated error message, or fall
through the general "unexpected clause" path?
Happy to own this — wanted to file the issue first to align on strategy
before writing code, given the scope.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]