Re: [I] Analyse MATCH clause performance (relationship basics) [age]

via GitHub Sun, 08 Oct 2023 08:24:00 -0700


Zainab-Saad commented on issue #982:
URL: https://github.com/apache/age/issues/982#issuecomment-1752059252


   Tested with around 1048567 edges, 12637040 and 9960576 two differently 
labelled vertices.
   Label redesign branch chooses normal seq scan over parallel seq scan for 
simple edge finding `MATCH(:Person)-[e:EDGE]->(:Title)`
   In the following queries, I have first tested without changing any 
configuration parameters, but after that changed the `parallel_tuple_cost` 
parameter to 0 in order to force a parallel seq scan for testing. This resulted 
in less than half the execution time as before. 
   ```
   agedb=# SHOW parallel_tuple_cost;
    parallel_tuple_cost 
   ---------------------
    0.1
   (1 row)
   
   agedb=# SELECT * FROM cypher('imdb', $$
   EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->()              
   RETURN e                                                   
   $$) AS (result agtype);
                                                         QUERY PLAN             
                                         
   
----------------------------------------------------------------------------------------------------------------------
    Seq Scan on "EDGE" e  (cost=0.00..33392.76 rows=1048567 width=32) (actual 
time=0.047..4619.293 rows=1048567 loops=1)
      Filter: (start_label_id = 3)
    Planning Time: 0.220 ms
    Execution Time: 4670.069 ms
   (4 rows)
   
   agedb=# SELECT * FROM cypher('imdb', $$
   EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title)              
   RETURN e
   $$) AS (result agtype);
                                                         QUERY PLAN             
                                         
   
----------------------------------------------------------------------------------------------------------------------
    Seq Scan on "EDGE" e  (cost=0.00..36014.17 rows=1048567 width=32) (actual 
time=0.045..4615.328 rows=1048567 loops=1)
      Filter: ((start_label_id = 3) AND (end_label_id = 5))
    Planning Time: 0.150 ms
    Execution Time: 4665.713 ms
   (4 rows)
   
   agedb=# SET parallel_tuple_cost TO 0;
   SET
   agedb=# SELECT * FROM cypher('imdb', $$
   EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->()              
   RETURN e
   $$) AS (result agtype);
                                                               QUERY PLAN       
                                                      
   
-----------------------------------------------------------------------------------------------------------------------------------
    Gather  (cost=1000.00..20630.32 rows=1048567 width=32) (actual 
time=0.561..1606.535 rows=1048567 loops=1)
      Workers Planned: 2
      Workers Launched: 2
      ->  Parallel Seq Scan on "EDGE" e  (cost=0.00..19630.32 rows=436903 
width=32) (actual time=0.246..1543.629 rows=349522 loops=3)
            Filter: (start_label_id = 3)
    Planning Time: 0.147 ms
    Execution Time: 1660.394 ms
   (7 rows)
   
   agedb=# SELECT * FROM cypher('imdb', $$
   EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title)              
   RETURN e
   $$) AS (result agtype);
                                                               QUERY PLAN       
                                                      
   
-----------------------------------------------------------------------------------------------------------------------------------
    Gather  (cost=1000.00..21722.57 rows=1048567 width=32) (actual 
time=0.401..1622.878 rows=1048567 loops=1)
      Workers Planned: 2
      Workers Launched: 2
      ->  Parallel Seq Scan on "EDGE" e  (cost=0.00..20722.57 rows=436903 
width=32) (actual time=0.219..1554.710 rows=349522 loops=3)
            Filter: ((start_label_id = 3) AND (end_label_id = 5))
    Planning Time: 0.110 ms
    Execution Time: 1676.656 ms
   (7 rows)
   
   agedb=# SHOW parallel_setup_cost;
    parallel_setup_cost 
   ---------------------
    1000
   (1 row)
   
   agedb=# SET parallel_setup_cost TO 10;
   SET
   agedb=# SELECT * FROM cypher('imdb', $$
   EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->()              
   RETURN e
   $$) AS (result agtype);
                                                               QUERY PLAN       
                                                      
   
-----------------------------------------------------------------------------------------------------------------------------------
    Gather  (cost=10.00..19640.32 rows=1048567 width=32) (actual 
time=0.471..1604.505 rows=1048567 loops=1)
      Workers Planned: 2
      Workers Launched: 2
      ->  Parallel Seq Scan on "EDGE" e  (cost=0.00..19630.32 rows=436903 
width=32) (actual time=0.347..1540.499 rows=349522 loops=3)
            Filter: (start_label_id = 3)
    Planning Time: 0.147 ms
    Execution Time: 1658.601 ms
   (7 rows)
   
   agedb=# SELECT * FROM cypher('imdb', $$
   EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title)              
   RETURN e
   $$) AS (result agtype);
                                                               QUERY PLAN       
                                                      
   
-----------------------------------------------------------------------------------------------------------------------------------
    Gather  (cost=10.00..20732.57 rows=1048567 width=32) (actual 
time=0.341..1617.122 rows=1048567 loops=1)
      Workers Planned: 2
      Workers Launched: 2
      ->  Parallel Seq Scan on "EDGE" e  (cost=0.00..20722.57 rows=436903 
width=32) (actual time=0.240..1553.976 rows=349522 loops=3)
            Filter: ((start_label_id = 3) AND (end_label_id = 5))
    Planning Time: 0.110 ms
    Execution Time: 1672.127 ms
   (7 rows)
   ```
   
   Below is the QPT for these queries run on AGE (without label redesign and 
the HEAD of the branch is at commit `b4574f5df8de721eed0ed67a2788075b387af418` 
which is the base of the 
[label_redesign](https://github.com/rafsun42/age/tree/label_redesign) branch)
   I have first tested without changing any configuration parameters, but after 
that changed the `max_parallel_workers_per_gather ` to 0 in order to force the 
normal seq scan instead of parallel scan (did this for comparison of execution 
times with the above QPTs).
   ```
   agedb=# SELECT * FROM cypher('imdb', $$
   EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->()              
   RETURN e                                                   
   $$) AS (result agtype);
                                                              QUERY PLAN        
                                                   
   
--------------------------------------------------------------------------------------------------------------------------------
    Gather  (cost=1000.00..15799.77 rows=5243 width=32) (actual 
time=0.343..875.155 rows=1048567 loops=1)
      Workers Planned: 2
      Workers Launched: 2
      ->  Parallel Seq Scan on "EDGE" e  (cost=0.00..14275.47 rows=2185 
width=32) (actual time=0.414..819.148 rows=349522 loops=3)
            Filter: ((_extract_label_id(start_id))::integer = 3)
    Planning Time: 0.095 ms
    Execution Time: 929.516 ms
   (7 rows)
   
   agedb=# SELECT * FROM cypher('imdb', $$
   EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title)              
   RETURN e
   $$) AS (result agtype);
                                                             QUERY PLAN         
                                                 
   
------------------------------------------------------------------------------------------------------------------------------
    Gather  (cost=1000.00..17451.71 rows=26 width=32) (actual 
time=0.491..879.482 rows=1048567 loops=1)
      Workers Planned: 2
      Workers Launched: 2
      ->  Parallel Seq Scan on "EDGE" e  (cost=0.00..16449.11 rows=11 width=32) 
(actual time=3.546..829.526 rows=349522 loops=3)
            Filter: (((_extract_label_id(start_id))::integer = 3) AND 
((_extract_label_id(end_id))::integer = 4))
    Planning Time: 0.137 ms
    Execution Time: 933.877 ms
   (7 rows)
   
   agedb=# SHOW max_parallel_workers_per_gather;
    max_parallel_workers_per_gather 
   ---------------------------------
    2
   (1 row)
   
   agedb=# SET max_parallel_workers_per_gather TO 0;
   SET
   agedb=# SELECT * FROM cypher('imdb', $$
   EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->()              
   RETURN e
   $$) AS (result agtype);
                                                       QUERY PLAN               
                                      
   
-------------------------------------------------------------------------------------------------------------------
    Seq Scan on "EDGE" e  (cost=0.00..23465.72 rows=5243 width=32) (actual 
time=0.088..2397.991 rows=1048567 loops=1)
      Filter: ((_extract_label_id(start_id))::integer = 3)
    Planning Time: 0.115 ms
    Execution Time: 2448.949 ms
   (4 rows)
   
   agedb=# SELECT * FROM cypher('imdb', $$
   EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title)              
   RETURN e
   $$) AS (result agtype);
                                                      QUERY PLAN                
                                    
   
-----------------------------------------------------------------------------------------------------------------
    Seq Scan on "EDGE" e  (cost=0.00..28682.47 rows=26 width=32) (actual 
time=0.081..2464.521 rows=1048567 loops=1)
      Filter: (((_extract_label_id(start_id))::integer = 3) AND 
((_extract_label_id(end_id))::integer = 4))
    Planning Time: 0.127 ms
    Execution Time: 2515.350 ms
   (4 rows)
   ```
   
   @rafsun42 @CapnSpek 
   Do you think is it something for which we should further look into code to 
improve this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Analyse MATCH clause performance (relationship basics) [age]

Reply via email to