rafsun42 commented on issue #1021:
URL: https://github.com/apache/age/issues/1021#issuecomment-1634660959
## Approach 1 - Using a trimmed and indexed version of label table for join
### The query
Cypher query that extracts label ID (see the QPT just below):
```sql
SELECT * FROM cypher('imdb',
$$
EXPLAIN MATCH (p:Person{primaryName:'Christian
Bale'})-[e:IS_IN]->(:Title) RETURN e
$$) as (a agtype);
```
```txt
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Gather (cost=457881.41..742804.15 rows=105 width=32)
Workers Planned: 2
-> Parallel Hash Join (cost=456881.41..741793.65 rows=44 width=32)
Hash Cond: (e.start_id = p.id)
-> Parallel Seq Scan on "IS_IN" e (cost=0.00..284748.30
rows=43610 width=29)
Filter: ((_extract_label_id(end_id))::integer = 3)
-> Parallel Hash (cost=456815.59..456815.59 rows=5266 width=8)
-> Parallel Seq Scan on "Person" p (cost=0.00..456815.59
rows=5266 width=8)
Filter: (properties @>
agtype_build_map('primaryName'::text, '"Christian Bale"'::agtype))
(9 rows)
```
My goal is to replace the use of `_extract_label_id` in the Filter node.
This line is filtering out edges where end node is not a `Title`.
### Building the solution that does not extract label ID
My solution is to create a trimmed and indexed table of Title table. Calling
it `Title_hash`. It has only ID column and indexed by hash method.
```sql
CREATE SCHEMA imdb_dev;
CREATE TABLE imdb_dev."Title_hash" (id graphid);
CREATE INDEX imdb_dev_title_hash_id ON imdb_dev."Title_hash" USING hash (id);
INSERT INTO imdb_dev."Title_hash" SELECT id FROM imdb."Title"; -- load the
new table
```
### Query on the new solution
The SQL query that uses the new table. It is equivalent to the above cypher
query.
```sql
EXPLAIN
SELECT *
FROM imdb."IS_IN" e
JOIN imdb."Person" p
ON e.start_id = p.id
WHERE
(p.properties @> agtype_build_map('primaryName'::text, '"Christian
Bale"'::agtype))
AND
e.end_id IN (SELECT id FROM imdb_dev."Title_hash")
;
```
```txt
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Gather (cost=457881.41..745430.91 rows=20933 width=238)
Workers Planned: 2
-> Nested Loop Semi Join (cost=456881.41..742337.61 rows=8722 width=238)
-> Parallel Hash Join (cost=456881.41..730763.53 rows=8722
width=238)
Hash Cond: (e.start_id = p.id)
-> Parallel Seq Scan on "IS_IN" e (cost=0.00..241138.20
rows=8722020 width=29)
-> Parallel Hash (cost=456815.59..456815.59 rows=5266
width=209)
-> Parallel Seq Scan on "Person" p
(cost=0.00..456815.59 rows=5266 width=209)
Filter: (properties @>
agtype_build_map('primaryName'::text, '"Christian Bale"'::agtype))
-> Index Scan using imdb_dev_title_hash_id on "Title_hash"
(cost=0.00..1.32 rows=1 width=8)
Index Cond: (id = e.end_id)
(11 rows)
```
### Rationale
Because Title_hash has less data per row and it is indexed, joining it would
be faster than joining with the original Title table. The cost of these query
is almost similar.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]