rafsun42 commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1629853333
**3. filter_vertices_on_label_id**
This function is used internally by the following query:
```sql
MATCH(:Person)-[:IN]->(t:Title) RETURN t
```
The QPT is:
```
Gather (cost=735883.94..1181947.48 rows=104664 width=32)
Workers Planned: 2
-> Parallel Hash Join (cost=734883.94..1170481.08 rows=43610 width=32)
Hash Cond: (_age_default_alias_0.end_id = t.id)
-> Parallel Seq Scan on "IN" _age_default_alias_0
(cost=0.00..284748.30 rows=43610 width=8)
Filter: ((_extract_label_id(start_id))::integer = 4)
-> Parallel Hash (cost=533288.42..533288.42 rows=4145242
width=270)
-> Parallel Seq Scan on "Title" t (cost=0.00..533288.42
rows=4145242 width=270)
```
The function adds filter condition in a query plan. In the above QPT, this
line `Filter: ((_extract_label_id(start_id))::integer = 4)
` is built by it.
Because person is filtered by only label (i.e. `(:Person)`) and no property
filter or variable is used, internally the `Person` table is not joined with
the `IN` table. The `_extract_label_id` can tell which label `start_id` belongs
to, and eliminates the join.
In order to drop the concept of `graphid`, we will need to stop using the
function `_extract_label_id`. One alternative is to actually to the join.
Except, not with the `Person` table. A duplicate table of `Person` can be used.
It can be trimmed to have only ID column and indexed strategically, to reduce
the join time. This solution is discussed in detail in issue #1021.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]