rafsun42 commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1629853333

   **3. filter_vertices_on_label_id**
   
   This function is used internally by the following query:
   ```sql
   MATCH(:Person)-[:IN]->(t:Title) RETURN t
   ```
   The QPT is:
   ```
    Gather  (cost=735883.94..1181947.48 rows=104664 width=32)
      Workers Planned: 2
      ->  Parallel Hash Join  (cost=734883.94..1170481.08 rows=43610 width=32)
            Hash Cond: (_age_default_alias_0.end_id = t.id)
            ->  Parallel Seq Scan on "IN" _age_default_alias_0  
(cost=0.00..284748.30 rows=43610 width=8)
                  Filter: ((_extract_label_id(start_id))::integer = 4)
            ->  Parallel Hash  (cost=533288.42..533288.42 rows=4145242 
width=270)
                  ->  Parallel Seq Scan on "Title" t  (cost=0.00..533288.42 
rows=4145242 width=270)
   ```
   The function adds filter condition in a query plan. In the above QPT, this 
line `Filter: ((_extract_label_id(start_id))::integer = 4)
   ` is built by it.
   
   Because person is filtered by only label (i.e. `(:Person)`) and no property 
filter or variable is used, internally the `Person` table is not joined with 
the `IN` table. The `_extract_label_id` can tell which label `start_id` belongs 
to, and eliminates the join. 
   
   In order to drop the concept of `graphid`, we will need to stop using the 
function `_extract_label_id`. One alternative is to actually to the join. 
Except, not with the `Person` table. A duplicate table of `Person` can be used. 
It can be trimmed to have only ID column and indexed strategically, to reduce 
the join time. This solution is discussed in detail in issue #1021.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to