JoshInnis opened a new issue #89:
URL: https://github.com/apache/incubator-age/issues/89
### Background
The ExecMaterial node in an Execution Tree may not rescan it's children
nodes. The material node acts as an intermediary between it's parent and it's
children. During the initial run, MaterialNode will execute its subplans, cache
the results in a data structure called a Tuplestorestate (postgres has only
exposed the name of, the structure is hidden in Tuplestorestate.c), and pass
the child tuple to its parent.
When the node is notified it will be rescanned or it will change scan
direction. The material node will try to get the needed tuple in its local
Tuplestorestate before scanning the child Append node. Assuming the above query
will return at least one result, the tuplestorestate will be used.
The following query will use a MaterialNode in it's execution:
`SELECT * FROM cypher('G', $$ match (a)-[e]->(b) delete e return e$$) AS (u
agtype);`
The Execution Tree:
```
SELECT * FROM cypher('G', $$EXPLAIN match ()-[e]->(b) delete e return e$$)
AS (u agtype);
QUERY PLAN
------------------------------------------------------------------------------------------------
Custom Scan (Cypher Delete) (cost=0.00..0.00 rows=0 width=32)
-> Subquery Scan on _ (cost=0.00..3.06 rows=3 width=32)
-> Nested Loop (cost=0.00..3.03 rows=3 width=96)
Join Filter: (graphid_to_agtype(e.end_id) =
graphid_to_agtype(b.id))
-> Append (cost=0.00..1.03 rows=3 width=38)
-> Seq Scan on _ag_label_edge e (cost=0.00..0.00
rows=1 width=56)
-> Seq Scan on edge e_1 (cost=0.00..1.02 rows=2
width=29)
-> Materialize (cost=0.00..1.05 rows=3 width=8)
-> Append (cost=0.00..1.03 rows=3 width=8)
-> Seq Scan on _ag_label_vertex b
(cost=0.00..0.00 rows=1 width=8)
-> Seq Scan on vertex b_1 (cost=0.00..1.02
rows=2 width=8)
(11 rows)
```
Cypher's Delete clause (The Custom Scan node at the top of the execution
tree) attempts to dive into its children to get the physical tuple that it has
been asked to delete. Which is stored in the SeqScan, for the above query
(DELETE, CREATE, SET, or REMOVE may store it also). It needs this information
because we need the delete clause to mark the tuple as deleted*.
*Deleting a tuple in Postgres whether using Cypher's or Postgres' delete
functionality does't actually delete data, they just mark it as no longer a
valid tuple for future queries.
### Problem
The problem is that when the Materialize Node passes a cached tuple: the
SeqScan is not going to be holding the physical tuple anymore, NULL will be
returned to Cypher Delete node. The delete clause then cannot find the tuple
and doesn't delete the tuple.
### Solution
Instead of the DELETE (or SET/REMOVE) using its children to get the physical
location of the tuple on disc, the clause get it themselves. We start a new
scan on the table until we find the physical tuple. Use the id, which is unique
per edge/vertex in any given graph.
### Considerations
This will slow down performance, because we have to scan the disc for the
tuple's location ourselves. However, using the execution tree when possible and
scanning the tables when not will be time consuming and potentially
inconsequential relative to other performance solutions that could render this
dual retrieval solution useless.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]