jrgemignani commented on issue #2194:
URL: https://github.com/apache/age/issues/2194#issuecomment-3166241750

   @serdarmicrosoft PR #2199 adds additional variables that can have a huge 
impact on the performance of this query. Below I have added the output of 
explain analyze on my server for the above query -
   
   **SQL baseline**
   ```
   psql-17.5-5432-pgsql=# explain analyze select count(*), u.id from 
goodreads_graph_x."User" u, goodreads_graph_x."HAS_INTERACTION" h, 
goodreads_graph_x."Book" b where u.id = h.start_id and b.id = h.end_id GROUP BY 
u.id ORDER BY count(*) DESC LIMIT 10;
                                                                                
                   QUERY PLAN                                                   
      
                                        
   
----------------------------------------------------------------------------------------------------------------------------------------------------------------
   -------------------------------------
    Limit  (cost=828731.73..828731.75 rows=10 width=16) (actual 
time=4988.478..5016.965 rows=10 loops=1)
      ->  Sort  (cost=828731.73..830343.44 rows=644686 width=16) (actual 
time=4988.476..5016.962 rows=10 loops=1)
            Sort Key: (count(*)) DESC
            Sort Method: top-N heapsort  Memory: 25kB
            ->  Finalize GroupAggregate  (cost=651469.34..814800.29 rows=644686 
width=16) (actual time=4860.272..4986.960 rows=132661 loops=1)
                  Group Key: u.id
                  ->  Gather Merge  (cost=651469.34..801906.57 rows=1289372 
width=16) (actual time=4860.263..4925.715 rows=135627 loops=1)
                        Workers Planned: 2
                        Workers Launched: 2
                        ->  Sort  (cost=650469.31..652081.03 rows=644686 
width=16) (actual time=4852.214..4858.651 rows=45209 loops=3)
                              Sort Key: u.id
                              Sort Method: quicksort  Memory: 2966kB
                              Worker 0:  Sort Method: quicksort  Memory: 2954kB
                              Worker 1:  Sort Method: quicksort  Memory: 2927kB
                              ->  Partial HashAggregate  
(cost=538245.23..577244.80 rows=644686 width=16) (actual 
time=4816.899..4831.099 rows=45209 loops=3)
                                    Group Key: u.id
                                    Planned Partitions: 8  Batches: 1  Memory 
Usage: 6673kB
                                    Worker 0:  Batches: 1  Memory Usage: 6673kB
                                    Worker 1:  Batches: 1  Memory Usage: 6673kB
                                    ->  Parallel Hash Join  
(cost=19772.67..303865.71 rows=4166747 width=8) (actual time=2707.748..3833.753 
rows=3333333 loops=3)
                                          Hash Cond: (h.start_id = u.id)
                                          ->  Parallel Hash Join  
(cost=2374.89..219877.18 rows=4166747 width=8) (actual time=17.647..1876.387 
rows=3333333 loops=3
   )
                                                Hash Cond: (h.end_id = b.id)
                                                ->  Parallel Seq Scan on 
"HAS_INTERACTION" h  (cost=0.00..184515.47 rows=4166747 width=16) (actual 
time=0.039..507.
   552 rows=3333333 loops=3)
                                                ->  Parallel Hash  
(cost=1888.44..1888.44 rows=38916 width=8) (actual time=16.853..16.854 
rows=31133 loops=3)
                                                      Buckets: 131072  Batches: 
1  Memory Usage: 4768kB
                                                      ->  Parallel Index Only 
Scan using idx_book_id on "Book" b  (cost=0.29..1888.44 rows=38916 width=8) 
(actual t
   ime=0.049..7.166 rows=31133 loops=3)
                                                            Heap Fetches: 0
                                          ->  Parallel Hash  
(cost=12990.05..12990.05 rows=268619 width=8) (actual time=113.746..113.747 
rows=214895 loops=3)
                                                Buckets: 262144 (originally 
262144)  Batches: 16 (originally 8)  Memory Usage: 12352kB
                                                ->  Parallel Index Only Scan 
using "User_id_idx" on "User" u  (cost=0.42..12990.05 rows=268619 width=8) 
(actual tim
   e=0.046..50.956 rows=214895 loops=3)
                                                      Heap Fetches: 0
    Planning Time: 0.929 ms
    Execution Time: 5019.400 ms
   (34 rows)
   
   psql-17.5-5432-pgsql=#
   ```
   
   **Original Cypher query without Book endpoint**
   ```
   psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain 
analyze MATCH (u:User)-[:HAS_INTERACTION]->() RETURN count(*), id(u) ORDER BY 
count(*) DESC LIMIT 10 $$) AS (count agtype, id agtype);
                                                                                
                    QUERY PLAN                                                  
      
                                          
   
----------------------------------------------------------------------------------------------------------------------------------------------------------------
   ---------------------------------------
    Limit  (cost=34041184.37..34041184.40 rows=10 width=64) (actual 
time=82382.918..82407.803 rows=10 loops=1)
      ->  Sort  (cost=34041184.37..34042796.09 rows=644686 width=64) (actual 
time=82382.916..82407.800 rows=10 loops=1)
            Sort Key: ((count(*))::agtype) DESC
            Sort Method: top-N heapsort  Memory: 26kB
            ->  GroupAggregate  (cost=1256126.22..34027252.94 rows=644686 
width=64) (actual time=25704.595..82308.264 rows=132661 loops=1)
                  Group Key: age_id(_agtype_build_vertex(u.id, 
_label_name('237668'::oid, u.id), u.properties))
                  ->  Nested Loop  (cost=1256126.22..21684786.36 
rows=2465914571 width=32) (actual time=25702.797..79040.874 rows=10000000 
loops=1)
                        ->  Gather Merge  (cost=1256126.21..2420813.48 
rows=10000193 width=118) (actual time=25702.610..40552.783 rows=10000000 
loops=1)
                              Workers Planned: 2
                              Workers Launched: 2
                              ->  Sort  (cost=1255126.18..1265543.05 
rows=4166747 width=118) (actual time=25658.964..33652.909 rows=3333333 loops=3)
                                    Sort Key: 
(age_id(_agtype_build_vertex(u.id, _label_name('237668'::oid, u.id), 
u.properties)))
                                    Sort Method: external merge  Disk: 483984kB
                                    Worker 0:  Sort Method: external merge  
Disk: 491056kB
                                    Worker 1:  Sort Method: external merge  
Disk: 502712kB
                                    ->  Parallel Hash Join  
(cost=21619.93..284274.14 rows=4166747 width=118) (actual 
time=1935.223..13026.837 rows=3333333 loops=3
   )
                                          Hash Cond: 
(_age_default_alias_0.start_id = u.id)
                                          ->  Parallel Seq Scan on 
"HAS_INTERACTION" _age_default_alias_0  (cost=0.00..184515.47 rows=4166747 
width=16) (actual tim
   e=0.050..845.514 rows=3333333 loops=3)
                                          ->  Parallel Hash  
(cost=13802.19..13802.19 rows=268619 width=110) (actual time=318.674..318.675 
rows=214895 loops=3)
                                                Buckets: 65536 (originally 
65536)  Batches: 32 (originally 16)  Memory Usage: 9824kB
                                                ->  Parallel Seq Scan on "User" 
u  (cost=0.00..13802.19 rows=268619 width=110) (actual time=0.160..199.981 
rows=214
   895 loops=3)
                        ->  Memoize  (cost=0.01..1.54 rows=5 width=8) (actual 
time=0.001..0.001 rows=1 loops=10000000)
                              Cache Key: _age_default_alias_0.end_id
                              Cache Mode: logical
                              Hits: 9907873  Misses: 92127  Evictions: 17229  
Overflows: 0  Memory Usage: 8193kB
                              ->  Append  (cost=0.00..1.53 rows=5 width=8) 
(actual time=0.011..0.020 rows=1 loops=92127)
                                    ->  Seq Scan on _ag_label_vertex 
_age_default_alias_1_1  (cost=0.00..0.00 rows=1 width=8) (actual 
time=0.000..0.000 rows=0 loop
   s=92127)
                                          Filter: (id = 
_age_default_alias_0.end_id)
                                    ->  Index Only Scan using "User_id_idx" on 
"User" _age_default_alias_1_2  (cost=0.42..0.44 rows=1 width=8) (actual 
time=0.005..
   0.005 rows=0 loops=92127)
                                          Index Cond: (id = 
_age_default_alias_0.end_id)
                                          Heap Fetches: 0
                                    ->  Index Only Scan using idx_book_id on 
"Book" _age_default_alias_1_3  (cost=0.29..0.31 rows=1 width=8) (actual 
time=0.004..0.
   005 rows=1 loops=92127)
                                          Index Cond: (id = 
_age_default_alias_0.end_id)
                                          Heap Fetches: 0
                                    ->  Index Only Scan using idx_author_id on 
"Author" _age_default_alias_1_4  (cost=0.29..0.31 rows=1 width=8) (actual 
time=0.003
   ..0.003 rows=0 loops=92127)
                                          Index Cond: (id = 
_age_default_alias_0.end_id)
                                          Heap Fetches: 0
                                    ->  Index Only Scan using idx_review_id on 
"Review" _age_default_alias_1_5  (cost=0.43..0.45 rows=1 width=8) (actual 
time=0.005
   ..0.005 rows=0 loops=92127)
                                          Index Cond: (id = 
_age_default_alias_0.end_id)
                                          Heap Fetches: 0
    Planning Time: 1.476 ms
    Execution Time: 82440.110 ms
   (42 rows)
   
   psql-17.5-5432-pgsql=#
   ```
   
   **Original Cypher query with Book endpoint**
   ```
   psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain 
analyze MATCH (u:User)-[:HAS_INTERACTION]->(:Book) RETURN count(*), id(u) ORDER 
BY count(*) DESC LIMIT 10 $$) AS (count agtype, id agtype);
                                                                                
                             QUERY PLAN                                         
      
                                                           
   
----------------------------------------------------------------------------------------------------------------------------------------------------------------
   --------------------------------------------------------
    Limit  (cost=1007580.65..1007580.67 rows=10 width=64) (actual 
time=18222.030..18248.143 rows=10 loops=1)
      ->  Sort  (cost=1007580.65..1009192.36 rows=644686 width=64) (actual 
time=18222.029..18248.140 rows=10 loops=1)
            Sort Key: ((count(*))::agtype) DESC
            Sort Method: top-N heapsort  Memory: 26kB
            ->  Finalize GroupAggregate  (cost=823871.40..993649.21 rows=644686 
width=64) (actual time=17789.320..18166.135 rows=132661 loops=1)
                  Group Key: (age_id(_agtype_build_vertex(u.id, 
_label_name('237668'::oid, u.id), u.properties)))
                  ->  Gather Merge  (cost=823871.40..974308.63 rows=1289372 
width=40) (actual time=17789.302..17949.998 rows=132923 loops=1)
                        Workers Planned: 2
                        Workers Launched: 2
                        ->  Sort  (cost=822871.37..824483.09 rows=644686 
width=40) (actual time=17762.365..17772.000 rows=44308 loops=3)
                              Sort Key: (age_id(_agtype_build_vertex(u.id, 
_label_name('237668'::oid, u.id), u.properties)))
                              Sort Method: quicksort  Memory: 3503kB
                              Worker 0:  Sort Method: quicksort  Memory: 3432kB
                              Worker 1:  Sort Method: quicksort  Memory: 3393kB
                              ->  Partial HashAggregate  
(cost=674786.11..743035.36 rows=644686 width=40) (actual 
time=17462.389..17476.932 rows=44308 loops=3)
                                    Group Key: 
age_id(_agtype_build_vertex(u.id, _label_name('237668'::oid, u.id), 
u.properties))
                                    Planned Partitions: 16  Batches: 1  Memory 
Usage: 6673kB
                                    Worker 0:  Batches: 1  Memory Usage: 6417kB
                                    Worker 1:  Batches: 1  Memory Usage: 6417kB
                                    ->  Parallel Hash Join  
(cost=23994.82..342748.46 rows=4166747 width=32) (actual 
time=2755.273..14367.058 rows=3333333 loops=3)
                                          Hash Cond: 
(_age_default_alias_0.start_id = u.id)
                                          ->  Parallel Hash Join  
(cost=2374.89..219877.18 rows=4166747 width=8) (actual time=17.622..1915.252 
rows=3333333 loops=3
   )
                                                Hash Cond: 
(_age_default_alias_0.end_id = _age_default_alias_1.id)
                                                ->  Parallel Seq Scan on 
"HAS_INTERACTION" _age_default_alias_0  (cost=0.00..184515.47 rows=4166747 
width=16) (actu
   al time=0.024..519.098 rows=3333333 loops=3)
                                                ->  Parallel Hash  
(cost=1888.44..1888.44 rows=38916 width=8) (actual time=17.200..17.201 
rows=31133 loops=3)
                                                      Buckets: 131072  Batches: 
1  Memory Usage: 4736kB
                                                      ->  Parallel Index Only 
Scan using idx_book_id on "Book" _age_default_alias_1  (cost=0.29..1888.44 
rows=38916
    width=8) (actual time=0.070..7.131 rows=31133 loops=3)
                                                            Heap Fetches: 0
                                          ->  Parallel Hash  
(cost=13802.19..13802.19 rows=268619 width=110) (actual time=118.590..118.590 
rows=214895 loops=3)
                                                Buckets: 65536 (originally 
65536)  Batches: 32 (originally 16)  Memory Usage: 9792kB
                                                ->  Parallel Seq Scan on "User" 
u  (cost=0.00..13802.19 rows=268619 width=110) (actual time=0.046..44.205 
rows=2148
   95 loops=3)
    Planning Time: 2.422 ms
    Execution Time: 18251.645 ms
   (33 rows)
   
   psql-17.5-5432-pgsql=#
   ```
   
   **Cypher query without Book endpoint using the new vars**
   ```
   psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain 
analyze MATCH (u:User)-[:HAS_INTERACTION]->() RETURN count(*), u_idc ORDER BY 
count(*) DESC LIMIT 10 $$) AS (count agtype, u_idc agtype);
                                                                                
                          QUERY PLAN                                            
      
                                                      
   
----------------------------------------------------------------------------------------------------------------------------------------------------------------
   ---------------------------------------------------
    Subquery Scan on _  (cost=6539203.76..6539203.81 rows=10 width=64) (actual 
time=10232.518..10232.605 rows=10 loops=1)
      ->  Limit  (cost=6539203.76..6539203.79 rows=10 width=40) (actual 
time=10232.514..10232.595 rows=10 loops=1)
            ->  Sort  (cost=6539203.76..6540815.48 rows=644686 width=40) 
(actual time=10232.512..10232.592 rows=10 loops=1)
                  Sort Key: ((count(*))::agtype) DESC
                  Sort Method: top-N heapsort  Memory: 26kB
                  ->  Finalize GroupAggregate  (cost=786218.01..6525272.33 
rows=644686 width=40) (actual time=2704.329..10140.579 rows=132661 loops=1)
                        Group Key: u.id
                        ->  Gather Merge  (cost=786218.01..6510766.89 
rows=1289372 width=16) (actual time=2704.314..10002.115 rows=133999 loops=1)
                              Workers Planned: 2
                              Workers Launched: 2
                              ->  Partial GroupAggregate  
(cost=785217.99..6360941.35 rows=644686 width=16) (actual 
time=2400.315..9630.184 rows=44666 loops=3)
                                    Group Key: u.id
                                    ->  Nested Loop  
(cost=785217.99..1217172.46 rows=1027464405 width=8) (actual 
time=2399.584..9121.389 rows=3333333 loops=3)
                                          ->  Merge Join  
(cost=785217.98..885067.31 rows=4166747 width=16) (actual 
time=2399.385..5151.825 rows=3333333 loops=3)
                                                Merge Cond: 
(_age_default_alias_0.start_id = u.id)
                                                ->  Sort  
(cost=785081.51..795498.38 rows=4166747 width=16) (actual 
time=2399.313..3120.343 rows=3333333 loops=3)
                                                      Sort Key: 
_age_default_alias_0.start_id
                                                      Sort Method: external 
merge  Disk: 87984kB
                                                      Worker 0:  Sort Method: 
external merge  Disk: 83192kB
                                                      Worker 1:  Sort Method: 
external merge  Disk: 83280kB
                                                      ->  Parallel Seq Scan on 
"HAS_INTERACTION" _age_default_alias_0  (cost=0.00..184515.47 rows=4166747 
width=16)
    (actual time=0.045..842.639 rows=3333333 loops=3)
                                                ->  Materialize  
(cost=0.42..18362.43 rows=644686 width=8) (actual time=0.059..560.710 
rows=3933332 loops=3)
                                                      ->  Index Only Scan using 
"User_id_idx" on "User" u  (cost=0.42..16750.72 rows=644686 width=8) (actual 
time=0
   .056..142.077 rows=644665 loops=3)
                                                            Heap Fetches: 0
                                          ->  Memoize  (cost=0.01..1.54 rows=5 
width=8) (actual time=0.001..0.001 rows=1 loops=10000000)
                                                Cache Key: 
_age_default_alias_0.end_id
                                                Cache Mode: logical
                                                Hits: 3386794  Misses: 70936  
Evictions: 0  Overflows: 0  Memory Usage: 7759kB
                                                Worker 0:  Hits: 3200070  
Misses: 69560  Evictions: 0  Overflows: 0  Memory Usage: 7609kB
                                                Worker 1:  Hits: 3201422  
Misses: 71218  Evictions: 0  Overflows: 0  Memory Usage: 7790kB
                                                ->  Append  (cost=0.00..1.53 
rows=5 width=8) (actual time=0.010..0.019 rows=1 loops=211714)
                                                      ->  Seq Scan on 
_ag_label_vertex _age_default_alias_1_1  (cost=0.00..0.00 rows=1 width=8) 
(actual time=0.000.
   .0.000 rows=0 loops=211714)
                                                            Filter: (id = 
_age_default_alias_0.end_id)
                                                      ->  Index Only Scan using 
"User_id_idx" on "User" _age_default_alias_1_2  (cost=0.42..0.44 rows=1 
width=8) (a
   ctual time=0.004..0.004 rows=0 loops=211714)
                                                            Index Cond: (id = 
_age_default_alias_0.end_id)
                                                            Heap Fetches: 0
                                                      ->  Index Only Scan using 
idx_book_id on "Book" _age_default_alias_1_3  (cost=0.29..0.31 rows=1 width=8) 
(act
   ual time=0.004..0.004 rows=1 loops=211714)
                                                            Index Cond: (id = 
_age_default_alias_0.end_id)
                                                            Heap Fetches: 0
                                                      ->  Index Only Scan using 
idx_author_id on "Author" _age_default_alias_1_4  (cost=0.29..0.31 rows=1 
width=8) 
   (actual time=0.003..0.003 rows=0 loops=211714)
                                                            Index Cond: (id = 
_age_default_alias_0.end_id)
                                                            Heap Fetches: 0
                                                      ->  Index Only Scan using 
idx_review_id on "Review" _age_default_alias_1_5  (cost=0.43..0.45 rows=1 
width=8) 
   (actual time=0.005..0.005 rows=0 loops=211714)
                                                            Index Cond: (id = 
_age_default_alias_0.end_id)
                                                            Heap Fetches: 0
    Planning Time: 1.860 ms
    Execution Time: 10250.143 ms
   (47 rows)
   
   psql-17.5-5432-pgsql=#
   ```
   
   **Cypher query with Book endpoint using the new vars**
   ```
   psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain 
analyze MATCH (u:User)-[:HAS_INTERACTION]->(:Book) RETURN count(*), u_idc ORDER 
BY count(*) DESC LIMIT 10 $$) AS (count agtype, u_idc agtype);
                                                                                
                                QUERY PLAN                                      
      
                                                                 
   
----------------------------------------------------------------------------------------------------------------------------------------------------------------
   --------------------------------------------------------------
    Subquery Scan on _  (cost=830343.44..830343.49 rows=10 width=64) (actual 
time=5091.056..5118.571 rows=10 loops=1)
      ->  Limit  (cost=830343.44..830343.47 rows=10 width=40) (actual 
time=5091.052..5118.561 rows=10 loops=1)
            ->  Sort  (cost=830343.44..831955.16 rows=644686 width=40) (actual 
time=5091.050..5118.558 rows=10 loops=1)
                  Sort Key: ((count(*))::agtype) DESC
                  Sort Method: top-N heapsort  Memory: 26kB
                  ->  Finalize GroupAggregate  (cost=651469.34..816412.01 
rows=644686 width=40) (actual time=4851.058..5033.469 rows=132661 loops=1)
                        Group Key: u.id
                        ->  Gather Merge  (cost=651469.34..801906.57 
rows=1289372 width=16) (actual time=4851.043..4917.321 rows=135554 loops=1)
                              Workers Planned: 2
                              Workers Launched: 2
                              ->  Sort  (cost=650469.31..652081.03 rows=644686 
width=16) (actual time=4840.942..4849.306 rows=45185 loops=3)
                                    Sort Key: u.id
                                    Sort Method: quicksort  Memory: 2962kB
                                    Worker 0:  Sort Method: quicksort  Memory: 
2914kB
                                    Worker 1:  Sort Method: quicksort  Memory: 
2969kB
                                    ->  Partial HashAggregate  
(cost=538245.23..577244.80 rows=644686 width=16) (actual 
time=4803.703..4818.569 rows=45185 loops=3)
                                          Group Key: u.id
                                          Planned Partitions: 8  Batches: 1  
Memory Usage: 6673kB
                                          Worker 0:  Batches: 1  Memory Usage: 
6673kB
                                          Worker 1:  Batches: 1  Memory Usage: 
6673kB
                                          ->  Parallel Hash Join  
(cost=19772.67..303865.71 rows=4166747 width=8) (actual time=2694.604..3822.223 
rows=3333333 loop
   s=3)
                                                Hash Cond: 
(_age_default_alias_0.start_id = u.id)
                                                ->  Parallel Hash Join  
(cost=2374.89..219877.18 rows=4166747 width=8) (actual time=17.696..1863.307 
rows=3333333 l
   oops=3)
                                                      Hash Cond: 
(_age_default_alias_0.end_id = _age_default_alias_1.id)
                                                      ->  Parallel Seq Scan on 
"HAS_INTERACTION" _age_default_alias_0  (cost=0.00..184515.47 rows=4166747 
width=16)
    (actual time=0.043..505.487 rows=3333333 loops=3)
                                                      ->  Parallel Hash  
(cost=1888.44..1888.44 rows=38916 width=8) (actual time=16.889..16.890 
rows=31133 loops=3)
                                                            Buckets: 131072  
Batches: 1  Memory Usage: 4736kB
                                                            ->  Parallel Index 
Only Scan using idx_book_id on "Book" _age_default_alias_1  (cost=0.29..1888.44 
rows
   =38916 width=8) (actual time=0.068..7.123 rows=31133 loops=3)
                                                                  Heap Fetches: 0
                                                ->  Parallel Hash  
(cost=12990.05..12990.05 rows=268619 width=8) (actual time=115.445..115.446 
rows=214895 loops=3)
                                                      Buckets: 262144 
(originally 262144)  Batches: 16 (originally 8)  Memory Usage: 12352kB
                                                      ->  Parallel Index Only 
Scan using "User_id_idx" on "User" u  (cost=0.42..12990.05 rows=268619 width=8) 
(actu
   al time=0.056..51.793 rows=214895 loops=3)
                                                            Heap Fetches: 0
    Planning Time: 1.453 ms
    Execution Time: 5123.253 ms
   (35 rows)
   
   psql-17.5-5432-pgsql=#
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@age.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to