jrgemignani commented on issue #2194: URL: https://github.com/apache/age/issues/2194#issuecomment-3166241750
@serdarmicrosoft PR #2199 adds additional variables that can have a huge impact on the performance of this query. Below I have added the output of explain analyze on my server for the above query - **SQL baseline** ``` psql-17.5-5432-pgsql=# explain analyze select count(*), u.id from goodreads_graph_x."User" u, goodreads_graph_x."HAS_INTERACTION" h, goodreads_graph_x."Book" b where u.id = h.start_id and b.id = h.end_id GROUP BY u.id ORDER BY count(*) DESC LIMIT 10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------- Limit (cost=828731.73..828731.75 rows=10 width=16) (actual time=4988.478..5016.965 rows=10 loops=1) -> Sort (cost=828731.73..830343.44 rows=644686 width=16) (actual time=4988.476..5016.962 rows=10 loops=1) Sort Key: (count(*)) DESC Sort Method: top-N heapsort Memory: 25kB -> Finalize GroupAggregate (cost=651469.34..814800.29 rows=644686 width=16) (actual time=4860.272..4986.960 rows=132661 loops=1) Group Key: u.id -> Gather Merge (cost=651469.34..801906.57 rows=1289372 width=16) (actual time=4860.263..4925.715 rows=135627 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=650469.31..652081.03 rows=644686 width=16) (actual time=4852.214..4858.651 rows=45209 loops=3) Sort Key: u.id Sort Method: quicksort Memory: 2966kB Worker 0: Sort Method: quicksort Memory: 2954kB Worker 1: Sort Method: quicksort Memory: 2927kB -> Partial HashAggregate (cost=538245.23..577244.80 rows=644686 width=16) (actual time=4816.899..4831.099 rows=45209 loops=3) Group Key: u.id Planned Partitions: 8 Batches: 1 Memory Usage: 6673kB Worker 0: Batches: 1 Memory Usage: 6673kB Worker 1: Batches: 1 Memory Usage: 6673kB -> Parallel Hash Join (cost=19772.67..303865.71 rows=4166747 width=8) (actual time=2707.748..3833.753 rows=3333333 loops=3) Hash Cond: (h.start_id = u.id) -> Parallel Hash Join (cost=2374.89..219877.18 rows=4166747 width=8) (actual time=17.647..1876.387 rows=3333333 loops=3 ) Hash Cond: (h.end_id = b.id) -> Parallel Seq Scan on "HAS_INTERACTION" h (cost=0.00..184515.47 rows=4166747 width=16) (actual time=0.039..507. 552 rows=3333333 loops=3) -> Parallel Hash (cost=1888.44..1888.44 rows=38916 width=8) (actual time=16.853..16.854 rows=31133 loops=3) Buckets: 131072 Batches: 1 Memory Usage: 4768kB -> Parallel Index Only Scan using idx_book_id on "Book" b (cost=0.29..1888.44 rows=38916 width=8) (actual t ime=0.049..7.166 rows=31133 loops=3) Heap Fetches: 0 -> Parallel Hash (cost=12990.05..12990.05 rows=268619 width=8) (actual time=113.746..113.747 rows=214895 loops=3) Buckets: 262144 (originally 262144) Batches: 16 (originally 8) Memory Usage: 12352kB -> Parallel Index Only Scan using "User_id_idx" on "User" u (cost=0.42..12990.05 rows=268619 width=8) (actual tim e=0.046..50.956 rows=214895 loops=3) Heap Fetches: 0 Planning Time: 0.929 ms Execution Time: 5019.400 ms (34 rows) psql-17.5-5432-pgsql=# ``` **Original Cypher query without Book endpoint** ``` psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain analyze MATCH (u:User)-[:HAS_INTERACTION]->() RETURN count(*), id(u) ORDER BY count(*) DESC LIMIT 10 $$) AS (count agtype, id agtype); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------- Limit (cost=34041184.37..34041184.40 rows=10 width=64) (actual time=82382.918..82407.803 rows=10 loops=1) -> Sort (cost=34041184.37..34042796.09 rows=644686 width=64) (actual time=82382.916..82407.800 rows=10 loops=1) Sort Key: ((count(*))::agtype) DESC Sort Method: top-N heapsort Memory: 26kB -> GroupAggregate (cost=1256126.22..34027252.94 rows=644686 width=64) (actual time=25704.595..82308.264 rows=132661 loops=1) Group Key: age_id(_agtype_build_vertex(u.id, _label_name('237668'::oid, u.id), u.properties)) -> Nested Loop (cost=1256126.22..21684786.36 rows=2465914571 width=32) (actual time=25702.797..79040.874 rows=10000000 loops=1) -> Gather Merge (cost=1256126.21..2420813.48 rows=10000193 width=118) (actual time=25702.610..40552.783 rows=10000000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=1255126.18..1265543.05 rows=4166747 width=118) (actual time=25658.964..33652.909 rows=3333333 loops=3) Sort Key: (age_id(_agtype_build_vertex(u.id, _label_name('237668'::oid, u.id), u.properties))) Sort Method: external merge Disk: 483984kB Worker 0: Sort Method: external merge Disk: 491056kB Worker 1: Sort Method: external merge Disk: 502712kB -> Parallel Hash Join (cost=21619.93..284274.14 rows=4166747 width=118) (actual time=1935.223..13026.837 rows=3333333 loops=3 ) Hash Cond: (_age_default_alias_0.start_id = u.id) -> Parallel Seq Scan on "HAS_INTERACTION" _age_default_alias_0 (cost=0.00..184515.47 rows=4166747 width=16) (actual tim e=0.050..845.514 rows=3333333 loops=3) -> Parallel Hash (cost=13802.19..13802.19 rows=268619 width=110) (actual time=318.674..318.675 rows=214895 loops=3) Buckets: 65536 (originally 65536) Batches: 32 (originally 16) Memory Usage: 9824kB -> Parallel Seq Scan on "User" u (cost=0.00..13802.19 rows=268619 width=110) (actual time=0.160..199.981 rows=214 895 loops=3) -> Memoize (cost=0.01..1.54 rows=5 width=8) (actual time=0.001..0.001 rows=1 loops=10000000) Cache Key: _age_default_alias_0.end_id Cache Mode: logical Hits: 9907873 Misses: 92127 Evictions: 17229 Overflows: 0 Memory Usage: 8193kB -> Append (cost=0.00..1.53 rows=5 width=8) (actual time=0.011..0.020 rows=1 loops=92127) -> Seq Scan on _ag_label_vertex _age_default_alias_1_1 (cost=0.00..0.00 rows=1 width=8) (actual time=0.000..0.000 rows=0 loop s=92127) Filter: (id = _age_default_alias_0.end_id) -> Index Only Scan using "User_id_idx" on "User" _age_default_alias_1_2 (cost=0.42..0.44 rows=1 width=8) (actual time=0.005.. 0.005 rows=0 loops=92127) Index Cond: (id = _age_default_alias_0.end_id) Heap Fetches: 0 -> Index Only Scan using idx_book_id on "Book" _age_default_alias_1_3 (cost=0.29..0.31 rows=1 width=8) (actual time=0.004..0. 005 rows=1 loops=92127) Index Cond: (id = _age_default_alias_0.end_id) Heap Fetches: 0 -> Index Only Scan using idx_author_id on "Author" _age_default_alias_1_4 (cost=0.29..0.31 rows=1 width=8) (actual time=0.003 ..0.003 rows=0 loops=92127) Index Cond: (id = _age_default_alias_0.end_id) Heap Fetches: 0 -> Index Only Scan using idx_review_id on "Review" _age_default_alias_1_5 (cost=0.43..0.45 rows=1 width=8) (actual time=0.005 ..0.005 rows=0 loops=92127) Index Cond: (id = _age_default_alias_0.end_id) Heap Fetches: 0 Planning Time: 1.476 ms Execution Time: 82440.110 ms (42 rows) psql-17.5-5432-pgsql=# ``` **Original Cypher query with Book endpoint** ``` psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain analyze MATCH (u:User)-[:HAS_INTERACTION]->(:Book) RETURN count(*), id(u) ORDER BY count(*) DESC LIMIT 10 $$) AS (count agtype, id agtype); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------- Limit (cost=1007580.65..1007580.67 rows=10 width=64) (actual time=18222.030..18248.143 rows=10 loops=1) -> Sort (cost=1007580.65..1009192.36 rows=644686 width=64) (actual time=18222.029..18248.140 rows=10 loops=1) Sort Key: ((count(*))::agtype) DESC Sort Method: top-N heapsort Memory: 26kB -> Finalize GroupAggregate (cost=823871.40..993649.21 rows=644686 width=64) (actual time=17789.320..18166.135 rows=132661 loops=1) Group Key: (age_id(_agtype_build_vertex(u.id, _label_name('237668'::oid, u.id), u.properties))) -> Gather Merge (cost=823871.40..974308.63 rows=1289372 width=40) (actual time=17789.302..17949.998 rows=132923 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=822871.37..824483.09 rows=644686 width=40) (actual time=17762.365..17772.000 rows=44308 loops=3) Sort Key: (age_id(_agtype_build_vertex(u.id, _label_name('237668'::oid, u.id), u.properties))) Sort Method: quicksort Memory: 3503kB Worker 0: Sort Method: quicksort Memory: 3432kB Worker 1: Sort Method: quicksort Memory: 3393kB -> Partial HashAggregate (cost=674786.11..743035.36 rows=644686 width=40) (actual time=17462.389..17476.932 rows=44308 loops=3) Group Key: age_id(_agtype_build_vertex(u.id, _label_name('237668'::oid, u.id), u.properties)) Planned Partitions: 16 Batches: 1 Memory Usage: 6673kB Worker 0: Batches: 1 Memory Usage: 6417kB Worker 1: Batches: 1 Memory Usage: 6417kB -> Parallel Hash Join (cost=23994.82..342748.46 rows=4166747 width=32) (actual time=2755.273..14367.058 rows=3333333 loops=3) Hash Cond: (_age_default_alias_0.start_id = u.id) -> Parallel Hash Join (cost=2374.89..219877.18 rows=4166747 width=8) (actual time=17.622..1915.252 rows=3333333 loops=3 ) Hash Cond: (_age_default_alias_0.end_id = _age_default_alias_1.id) -> Parallel Seq Scan on "HAS_INTERACTION" _age_default_alias_0 (cost=0.00..184515.47 rows=4166747 width=16) (actu al time=0.024..519.098 rows=3333333 loops=3) -> Parallel Hash (cost=1888.44..1888.44 rows=38916 width=8) (actual time=17.200..17.201 rows=31133 loops=3) Buckets: 131072 Batches: 1 Memory Usage: 4736kB -> Parallel Index Only Scan using idx_book_id on "Book" _age_default_alias_1 (cost=0.29..1888.44 rows=38916 width=8) (actual time=0.070..7.131 rows=31133 loops=3) Heap Fetches: 0 -> Parallel Hash (cost=13802.19..13802.19 rows=268619 width=110) (actual time=118.590..118.590 rows=214895 loops=3) Buckets: 65536 (originally 65536) Batches: 32 (originally 16) Memory Usage: 9792kB -> Parallel Seq Scan on "User" u (cost=0.00..13802.19 rows=268619 width=110) (actual time=0.046..44.205 rows=2148 95 loops=3) Planning Time: 2.422 ms Execution Time: 18251.645 ms (33 rows) psql-17.5-5432-pgsql=# ``` **Cypher query without Book endpoint using the new vars** ``` psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain analyze MATCH (u:User)-[:HAS_INTERACTION]->() RETURN count(*), u_idc ORDER BY count(*) DESC LIMIT 10 $$) AS (count agtype, u_idc agtype); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------- Subquery Scan on _ (cost=6539203.76..6539203.81 rows=10 width=64) (actual time=10232.518..10232.605 rows=10 loops=1) -> Limit (cost=6539203.76..6539203.79 rows=10 width=40) (actual time=10232.514..10232.595 rows=10 loops=1) -> Sort (cost=6539203.76..6540815.48 rows=644686 width=40) (actual time=10232.512..10232.592 rows=10 loops=1) Sort Key: ((count(*))::agtype) DESC Sort Method: top-N heapsort Memory: 26kB -> Finalize GroupAggregate (cost=786218.01..6525272.33 rows=644686 width=40) (actual time=2704.329..10140.579 rows=132661 loops=1) Group Key: u.id -> Gather Merge (cost=786218.01..6510766.89 rows=1289372 width=16) (actual time=2704.314..10002.115 rows=133999 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial GroupAggregate (cost=785217.99..6360941.35 rows=644686 width=16) (actual time=2400.315..9630.184 rows=44666 loops=3) Group Key: u.id -> Nested Loop (cost=785217.99..1217172.46 rows=1027464405 width=8) (actual time=2399.584..9121.389 rows=3333333 loops=3) -> Merge Join (cost=785217.98..885067.31 rows=4166747 width=16) (actual time=2399.385..5151.825 rows=3333333 loops=3) Merge Cond: (_age_default_alias_0.start_id = u.id) -> Sort (cost=785081.51..795498.38 rows=4166747 width=16) (actual time=2399.313..3120.343 rows=3333333 loops=3) Sort Key: _age_default_alias_0.start_id Sort Method: external merge Disk: 87984kB Worker 0: Sort Method: external merge Disk: 83192kB Worker 1: Sort Method: external merge Disk: 83280kB -> Parallel Seq Scan on "HAS_INTERACTION" _age_default_alias_0 (cost=0.00..184515.47 rows=4166747 width=16) (actual time=0.045..842.639 rows=3333333 loops=3) -> Materialize (cost=0.42..18362.43 rows=644686 width=8) (actual time=0.059..560.710 rows=3933332 loops=3) -> Index Only Scan using "User_id_idx" on "User" u (cost=0.42..16750.72 rows=644686 width=8) (actual time=0 .056..142.077 rows=644665 loops=3) Heap Fetches: 0 -> Memoize (cost=0.01..1.54 rows=5 width=8) (actual time=0.001..0.001 rows=1 loops=10000000) Cache Key: _age_default_alias_0.end_id Cache Mode: logical Hits: 3386794 Misses: 70936 Evictions: 0 Overflows: 0 Memory Usage: 7759kB Worker 0: Hits: 3200070 Misses: 69560 Evictions: 0 Overflows: 0 Memory Usage: 7609kB Worker 1: Hits: 3201422 Misses: 71218 Evictions: 0 Overflows: 0 Memory Usage: 7790kB -> Append (cost=0.00..1.53 rows=5 width=8) (actual time=0.010..0.019 rows=1 loops=211714) -> Seq Scan on _ag_label_vertex _age_default_alias_1_1 (cost=0.00..0.00 rows=1 width=8) (actual time=0.000. .0.000 rows=0 loops=211714) Filter: (id = _age_default_alias_0.end_id) -> Index Only Scan using "User_id_idx" on "User" _age_default_alias_1_2 (cost=0.42..0.44 rows=1 width=8) (a ctual time=0.004..0.004 rows=0 loops=211714) Index Cond: (id = _age_default_alias_0.end_id) Heap Fetches: 0 -> Index Only Scan using idx_book_id on "Book" _age_default_alias_1_3 (cost=0.29..0.31 rows=1 width=8) (act ual time=0.004..0.004 rows=1 loops=211714) Index Cond: (id = _age_default_alias_0.end_id) Heap Fetches: 0 -> Index Only Scan using idx_author_id on "Author" _age_default_alias_1_4 (cost=0.29..0.31 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=211714) Index Cond: (id = _age_default_alias_0.end_id) Heap Fetches: 0 -> Index Only Scan using idx_review_id on "Review" _age_default_alias_1_5 (cost=0.43..0.45 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=211714) Index Cond: (id = _age_default_alias_0.end_id) Heap Fetches: 0 Planning Time: 1.860 ms Execution Time: 10250.143 ms (47 rows) psql-17.5-5432-pgsql=# ``` **Cypher query with Book endpoint using the new vars** ``` psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain analyze MATCH (u:User)-[:HAS_INTERACTION]->(:Book) RETURN count(*), u_idc ORDER BY count(*) DESC LIMIT 10 $$) AS (count agtype, u_idc agtype); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------- Subquery Scan on _ (cost=830343.44..830343.49 rows=10 width=64) (actual time=5091.056..5118.571 rows=10 loops=1) -> Limit (cost=830343.44..830343.47 rows=10 width=40) (actual time=5091.052..5118.561 rows=10 loops=1) -> Sort (cost=830343.44..831955.16 rows=644686 width=40) (actual time=5091.050..5118.558 rows=10 loops=1) Sort Key: ((count(*))::agtype) DESC Sort Method: top-N heapsort Memory: 26kB -> Finalize GroupAggregate (cost=651469.34..816412.01 rows=644686 width=40) (actual time=4851.058..5033.469 rows=132661 loops=1) Group Key: u.id -> Gather Merge (cost=651469.34..801906.57 rows=1289372 width=16) (actual time=4851.043..4917.321 rows=135554 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=650469.31..652081.03 rows=644686 width=16) (actual time=4840.942..4849.306 rows=45185 loops=3) Sort Key: u.id Sort Method: quicksort Memory: 2962kB Worker 0: Sort Method: quicksort Memory: 2914kB Worker 1: Sort Method: quicksort Memory: 2969kB -> Partial HashAggregate (cost=538245.23..577244.80 rows=644686 width=16) (actual time=4803.703..4818.569 rows=45185 loops=3) Group Key: u.id Planned Partitions: 8 Batches: 1 Memory Usage: 6673kB Worker 0: Batches: 1 Memory Usage: 6673kB Worker 1: Batches: 1 Memory Usage: 6673kB -> Parallel Hash Join (cost=19772.67..303865.71 rows=4166747 width=8) (actual time=2694.604..3822.223 rows=3333333 loop s=3) Hash Cond: (_age_default_alias_0.start_id = u.id) -> Parallel Hash Join (cost=2374.89..219877.18 rows=4166747 width=8) (actual time=17.696..1863.307 rows=3333333 l oops=3) Hash Cond: (_age_default_alias_0.end_id = _age_default_alias_1.id) -> Parallel Seq Scan on "HAS_INTERACTION" _age_default_alias_0 (cost=0.00..184515.47 rows=4166747 width=16) (actual time=0.043..505.487 rows=3333333 loops=3) -> Parallel Hash (cost=1888.44..1888.44 rows=38916 width=8) (actual time=16.889..16.890 rows=31133 loops=3) Buckets: 131072 Batches: 1 Memory Usage: 4736kB -> Parallel Index Only Scan using idx_book_id on "Book" _age_default_alias_1 (cost=0.29..1888.44 rows =38916 width=8) (actual time=0.068..7.123 rows=31133 loops=3) Heap Fetches: 0 -> Parallel Hash (cost=12990.05..12990.05 rows=268619 width=8) (actual time=115.445..115.446 rows=214895 loops=3) Buckets: 262144 (originally 262144) Batches: 16 (originally 8) Memory Usage: 12352kB -> Parallel Index Only Scan using "User_id_idx" on "User" u (cost=0.42..12990.05 rows=268619 width=8) (actu al time=0.056..51.793 rows=214895 loops=3) Heap Fetches: 0 Planning Time: 1.453 ms Execution Time: 5123.253 ms (35 rows) psql-17.5-5432-pgsql=# ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@age.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org