Zainab-Saad commented on issue #982: URL: https://github.com/apache/age/issues/982#issuecomment-1752059252
Tested with around 1048567 edges, 12637040 and 9960576 two differently labelled vertices. Label redesign branch chooses normal seq scan over parallel seq scan for simple edge finding `MATCH(:Person)-[e:EDGE]->(:Title)` In the following queries, I have first tested without changing any configuration parameters, but after that changed the `parallel_tuple_cost` parameter to 0 in order to force a parallel seq scan for testing. This resulted in less than half the execution time as before. ``` agedb=# SHOW parallel_tuple_cost; parallel_tuple_cost --------------------- 0.1 (1 row) agedb=# SELECT * FROM cypher('imdb', $$ EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->() RETURN e $$) AS (result agtype); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Seq Scan on "EDGE" e (cost=0.00..33392.76 rows=1048567 width=32) (actual time=0.047..4619.293 rows=1048567 loops=1) Filter: (start_label_id = 3) Planning Time: 0.220 ms Execution Time: 4670.069 ms (4 rows) agedb=# SELECT * FROM cypher('imdb', $$ EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title) RETURN e $$) AS (result agtype); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Seq Scan on "EDGE" e (cost=0.00..36014.17 rows=1048567 width=32) (actual time=0.045..4615.328 rows=1048567 loops=1) Filter: ((start_label_id = 3) AND (end_label_id = 5)) Planning Time: 0.150 ms Execution Time: 4665.713 ms (4 rows) agedb=# SET parallel_tuple_cost TO 0; SET agedb=# SELECT * FROM cypher('imdb', $$ EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->() RETURN e $$) AS (result agtype); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..20630.32 rows=1048567 width=32) (actual time=0.561..1606.535 rows=1048567 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on "EDGE" e (cost=0.00..19630.32 rows=436903 width=32) (actual time=0.246..1543.629 rows=349522 loops=3) Filter: (start_label_id = 3) Planning Time: 0.147 ms Execution Time: 1660.394 ms (7 rows) agedb=# SELECT * FROM cypher('imdb', $$ EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title) RETURN e $$) AS (result agtype); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..21722.57 rows=1048567 width=32) (actual time=0.401..1622.878 rows=1048567 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on "EDGE" e (cost=0.00..20722.57 rows=436903 width=32) (actual time=0.219..1554.710 rows=349522 loops=3) Filter: ((start_label_id = 3) AND (end_label_id = 5)) Planning Time: 0.110 ms Execution Time: 1676.656 ms (7 rows) agedb=# SHOW parallel_setup_cost; parallel_setup_cost --------------------- 1000 (1 row) agedb=# SET parallel_setup_cost TO 10; SET agedb=# SELECT * FROM cypher('imdb', $$ EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->() RETURN e $$) AS (result agtype); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Gather (cost=10.00..19640.32 rows=1048567 width=32) (actual time=0.471..1604.505 rows=1048567 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on "EDGE" e (cost=0.00..19630.32 rows=436903 width=32) (actual time=0.347..1540.499 rows=349522 loops=3) Filter: (start_label_id = 3) Planning Time: 0.147 ms Execution Time: 1658.601 ms (7 rows) agedb=# SELECT * FROM cypher('imdb', $$ EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title) RETURN e $$) AS (result agtype); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Gather (cost=10.00..20732.57 rows=1048567 width=32) (actual time=0.341..1617.122 rows=1048567 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on "EDGE" e (cost=0.00..20722.57 rows=436903 width=32) (actual time=0.240..1553.976 rows=349522 loops=3) Filter: ((start_label_id = 3) AND (end_label_id = 5)) Planning Time: 0.110 ms Execution Time: 1672.127 ms (7 rows) ``` Below is the QPT for these queries run on AGE (without label redesign and the HEAD of the branch is at commit `b4574f5df8de721eed0ed67a2788075b387af418` which is the base of the [label_redesign](https://github.com/rafsun42/age/tree/label_redesign) branch) I have first tested without changing any configuration parameters, but after that changed the `max_parallel_workers_per_gather ` to 0 in order to force the normal seq scan instead of parallel scan (did this for comparison of execution times with the above QPTs). ``` agedb=# SELECT * FROM cypher('imdb', $$ EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->() RETURN e $$) AS (result agtype); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..15799.77 rows=5243 width=32) (actual time=0.343..875.155 rows=1048567 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on "EDGE" e (cost=0.00..14275.47 rows=2185 width=32) (actual time=0.414..819.148 rows=349522 loops=3) Filter: ((_extract_label_id(start_id))::integer = 3) Planning Time: 0.095 ms Execution Time: 929.516 ms (7 rows) agedb=# SELECT * FROM cypher('imdb', $$ EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title) RETURN e $$) AS (result agtype); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Gather (cost=1000.00..17451.71 rows=26 width=32) (actual time=0.491..879.482 rows=1048567 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on "EDGE" e (cost=0.00..16449.11 rows=11 width=32) (actual time=3.546..829.526 rows=349522 loops=3) Filter: (((_extract_label_id(start_id))::integer = 3) AND ((_extract_label_id(end_id))::integer = 4)) Planning Time: 0.137 ms Execution Time: 933.877 ms (7 rows) agedb=# SHOW max_parallel_workers_per_gather; max_parallel_workers_per_gather --------------------------------- 2 (1 row) agedb=# SET max_parallel_workers_per_gather TO 0; SET agedb=# SELECT * FROM cypher('imdb', $$ EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->() RETURN e $$) AS (result agtype); QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Seq Scan on "EDGE" e (cost=0.00..23465.72 rows=5243 width=32) (actual time=0.088..2397.991 rows=1048567 loops=1) Filter: ((_extract_label_id(start_id))::integer = 3) Planning Time: 0.115 ms Execution Time: 2448.949 ms (4 rows) agedb=# SELECT * FROM cypher('imdb', $$ EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title) RETURN e $$) AS (result agtype); QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on "EDGE" e (cost=0.00..28682.47 rows=26 width=32) (actual time=0.081..2464.521 rows=1048567 loops=1) Filter: (((_extract_label_id(start_id))::integer = 3) AND ((_extract_label_id(end_id))::integer = 4)) Planning Time: 0.127 ms Execution Time: 2515.350 ms (4 rows) ``` @rafsun42 @CapnSpek Do you think is it something for which we should further look into code to improve this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@age.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org