Re: [PR] Add index scan [age]

via GitHub Tue, 03 Mar 2026 11:57:20 -0800


jrgemignani commented on PR #2351:
URL: https://github.com/apache/age/pull/2351#issuecomment-3993202685


   @sandy-bes Please see this as well (Opus 4.6 review). All of these items 
need to be address.
   
   ## Review of PR #2351 — Add index scan
   
   The core idea — replacing O(N) sequential scans with O(log N) index scans in 
`entity_exists` and related hot paths — is excellent, and the performance 
numbers are compelling. However, there are several correctness, security, and 
code quality issues that need to be addressed before this can be merged.
   
   ### Critical Issues
   
   **1. RLS bypass in `cypher_set.c` index-scan path (security bug)**
   
   The index-scan branch in `process_update_list()` calls 
`update_entity_tuple()` directly without checking `check_security_quals`. The 
seq-scan fallback correctly gates the update on RLS USING-policy evaluation. 
This means SET/REMOVE will bypass row-level security when an index is present. 
The same RLS gating logic must be applied in the index-scan branch.
   
   **2. `heap_freetuple` on wrong tuple in `cypher_set.c` (memory corruption)**
   
   ```c
   heap_tuple = ExecFetchSlotHeapTuple(index_slot, true, &shouldFree);
   // ...
   heap_tuple = update_entity_tuple(resultRelInfo, slot, estate, heap_tuple);
   // ...
   if (shouldFree)
       heap_freetuple(heap_tuple);  // <-- frees the UPDATE result, not the 
fetched tuple
   ```
   
   After `update_entity_tuple()`, `heap_tuple` points to the new tuple, but 
`shouldFree` applies to the originally fetched tuple. This can free the wrong 
memory. Keep the fetched tuple in a separate variable and only free that one.
   
   ### Correctness Issues
   
   **3. `F_OIDEQ` vs `F_GRAPHIDEQ` mismatch in `agtype.c` fallback path**
   
   The index path correctly uses `F_GRAPHIDEQ`, but the seq-scan fallback at 
line ~5709 still uses `F_OIDEQ`. The column type is `graphid`, so both paths 
should use `F_GRAPHIDEQ` / `GRAPHID_GET_DATUM`.
   
   **4. Index attno vs heap attno confusion in `ag_label.c`**
   
   `ScanKeyInit` for the index scan uses `Anum_ag_label_name` as the attribute 
number. For index scans, the attno should be the index column number (typically 
1), not a heap attribute-number macro. Additionally, the index is on 
`graph_oid`, so referencing `Anum_ag_label_name` is semantically incorrect — it 
only works by coincidence if the value happens to equal 1. Consider using 
`ag_label_graph_oid_index_id()` instead of `get_relname_relid()` string lookup 
as well.
   
   ### Performance Issue
   
   **5. Index discovery runs per-tuple in hot paths**
   
   In `process_delete_list()` and `process_update_list()`, the index discovery 
block (`RelationGetIndexList` + open/close each index) runs inside the 
per-entity loop. For a DELETE or SET touching N entities of the same label, 
this runs N times. The index OID should be cached per-label or discovered once 
outside the inner loop.
   
   ### Code Quality Issues
   
   **6. Massive code duplication**
   
   The index discovery pattern (iterate `RelationGetIndexList`, open each, 
check `indisvalid`/`indnatts`/`indkey.values[0]`) is copy-pasted 6+ times 
across the changed files. This should be factored into a shared utility 
function, e.g.:
   
   ```c
   Oid find_usable_index_for_attr(Relation rel, AttrNumber attnum);
   ```
   
   **7. DETACH DELETE duplication in `cypher_delete.c`**
   
   The two-pass connected-edge check (START_ID pass + END_ID pass) duplicates 
~120 lines of nearly identical ACL check + RLS check + delete + error logic. A 
helper function would reduce this significantly.
   
   **8. C++ style comments**
   
   New code uses `//` comments in `ag_label.c` and elsewhere. The 
AGE/PostgreSQL codebase uses `/* ... */` exclusively. Please convert for 
consistency and C90 portability.
   
   **9. Brace style inconsistency**
   
   The PR mixes `} else {` with the existing codebase convention of:
   
   ```c
   }
   else
   {
   ```
   
   Please follow the existing style.
   
   **10. Inconsistent lock levels**
   
   Index open calls use `RowExclusiveLock` in some read-only scan paths 
(`ag_label.c` label enumeration, `age_global_graph.c`) where `AccessShareLock` 
would be more appropriate. The lock levels for index access during reads vs 
writes should be reviewed for consistency.
   
   ### Summary
   
   | # | Issue | Severity |
   |---|---|---|
   | 1 | RLS bypass in SET index path | **Critical** |
   | 2 | `heap_freetuple` on wrong tuple | **High** |
   | 3 | `F_OIDEQ` vs `F_GRAPHIDEQ` mismatch | **Medium** |
   | 4 | Index attno vs heap attno confusion | **Medium** |
   | 5 | Index discovery per-tuple in loops | **Medium** |
   | 6 | Index discovery code duplication (6x) | **High** |
   | 7 | DETACH DELETE logic duplication | **Medium** |
   | 8 | C++ style comments | **Low** |
   | 9 | Brace style inconsistency | **Low** |
   | 10 | Inconsistent lock levels | **Medium** |
   
   The performance improvement is real and valuable. With the security fix 
(#1), memory safety fix (#2), correctness fixes (#3-4), and a refactoring pass 
to deduplicate the index discovery logic (#5-7), this will be in good shape.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add index scan [age]

Reply via email to