ysvoon commented on issue #2198: URL: https://github.com/apache/age/issues/2198#issuecomment-3160397531
> This is due to the 2 MATCH commands used in creating the edges, as you are building your edges from the Person table; this is a O(n^2) operation. > > You might want to try the MERGE command. I see, I've tried the MERGE command today and there are a few issues I have observed: 1. The performance is alright, but again after >200k entries it starts to slow down. In the beginning it was taking just a few seconds to create 100 edges, but after the >200k mark, it takes >1 minute to create 100 edges. Was thinking could it be that after a certain point, indexing won't help and for MERGE, where it still does a `Seq Scan` on the edges, maybe this is why the performance worsens with time? 2. If I use the MERGE command like this, it creates duplicated vertices, so I don't think this is the correct way. ``` SELECT * FROM cypher('schema_name', $$ MERGE (a: Person {PersonId: '<redacted>'})-[e:edge_name]->(b: Person {PersonId: '<redacted>'}) SET e.StartId = '<redacted>', e.EndId = '<redacted>' RETURN e $$) as (e agtype); ``` 3. So I tried the MERGE command as recommended here https://github.com/apache/age/issues/1517, but it doesn't create the properties (returned empty), and I had to run this same query for the second time for the properties to show up, same as this issue here https://github.com/apache/age/issues/1907, although it says here that it has been resolved, I am seeing this issue still. ``` SELECT * FROM cypher('schema_name', $$ MERGE (a: Person {PersonId: '<redacted>'}) MERGE (b: Person {PersonId: '<redacted>'}) MERGE (a: Person {PersonId: '<redacted>'})-[e:edge_name]->(b: Person {PersonId: '<redacted>'}) SET e.StartId = '<redacted>', e.EndId = '<redacted>' RETURN e $$) as (e agtype); ``` Query Plan: ``` "Custom Scan (Cypher Set) (cost=0.00..0.00 rows=0 width=32) (actual time=822.972..822.984 rows=1 loops=1)" " -> Subquery Scan on _age_default_alias_previous_cypher_clause (cost=0.00..0.00 rows=1 width=32) (actual time=822.903..822.915 rows=1 loops=1)" " -> Custom Scan (Cypher Merge) (cost=0.00..0.00 rows=0 width=448) (actual time=822.901..822.911 rows=1 loops=1)" " -> Subquery Scan on _age_default_alias_previous_cypher_clause_1 (cost=0.00..1475455.11 rows=73770921 width=32) (actual time=0.180..0.190 rows=1 loops=1)" " -> Hash Right Join (cost=0.00..737745.90 rows=73770921 width=192) (actual time=0.180..0.188 rows=1 loops=1)" " Hash Cond: ((e.start_id = (age_id(_age_default_alias_previous_cypher_clause_2.a))::graphid) AND (e.end_id = (age_id(_age_default_alias_previous_cypher_clause_2.b))::graphid))" " Join Filter: ((age_properties(_age_default_alias_previous_cypher_clause_2.a) @> '{""PersonId"": ""<redacted>""}'::agtype) AND (age_properties(_age_default_alias_previous_cypher_clause_2.b) @> '{""PersonId"": ""<redacted""}'::agtype))" " -> Seq Scan on ""edge_name"" e (cost=0.00..24.55 rows=970 width=48) (actual time=0.002..0.002 rows=0 loops=1)" " -> Hash (cost=0.00..0.00 rows=0 width=64) (actual time=0.168..0.172 rows=1 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 9kB" " -> Custom Scan (Cypher Merge) (cost=0.00..0.00 rows=0 width=64) (actual time=0.158..0.162 rows=1 loops=1)" " -> Subquery Scan on _age_default_alias_previous_cypher_clause_2 (cost=130.57..1321589.78 rows=73770921 width=64) (actual time=0.158..0.162 rows=1 loops=1)" " -> Nested Loop Left Join (cost=130.57..583880.57 rows=73770921 width=128) (actual time=0.157..0.161 rows=1 loops=1)" " -> Custom Scan (Cypher Merge) (cost=0.00..0.00 rows=0 width=32) (actual time=0.071..0.072 rows=1 loops=1)" " -> Subquery Scan on _age_default_alias_previous_cypher_clause_3 (cost=130.57..30641.61 rows=8589 width=32) (actual time=0.070..0.072 rows=1 loops=1)" " -> Bitmap Heap Scan on ""Person"" a (cost=130.57..30555.72 rows=8589 width=64) (actual time=0.070..0.071 rows=1 loops=1)" " Recheck Cond: (properties @> '{""PersonId"": ""<redacted>""}'::agtype)" " Heap Blocks: exact=1" " -> Bitmap Index Scan on index_name (cost=0.00..128.42 rows=8589 width=0) (actual time=0.061..0.061 rows=1 loops=1)" " Index Cond: (properties @> '{""PersonId"": ""<redacted>""}'::agtype)" " -> Bitmap Heap Scan on ""Person"" b (cost=130.57..30512.77 rows=8589 width=32) (actual time=0.083..0.084 rows=1 loops=1)" " Recheck Cond: (properties @> '{""PersonId"": ""<redacted>""}'::agtype)" " Heap Blocks: exact=1" " -> Bitmap Index Scan on index_name (cost=0.00..128.42 rows=8589 width=0) (actual time=0.079..0.080 rows=1 loops=1)" " Index Cond: (properties @> '{""PersonId"": ""<redacted>""}'::agtype)" "Planning Time: 0.390 ms" "Execution Time: 823.126 ms" ``` > Without understanding your use case or dataset, it is hard to suggest alternatives. We have 3 vertex graph tables, and they are each in the volume of around 8M nodes as of now, and we need to create edges to represent the relationships between them. Currently, there are 3 types of edges that needs to be created, and each edge graph table will create around 8M edges. There are 3 such edge graph tables, so in total there are approximately >24M edges to be created. Currently, it still seems too slow with the MERGE clause and I am unsure if this is normal, but would there be any other best practices that we can try to apply to speed up this process? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@age.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org