ysvoon commented on issue #2198: URL: https://github.com/apache/age/issues/2198#issuecomment-3153047731
> [@ysvoon](https://github.com/ysvoon) What I can say about your query above is that it is not scalable for larger datasets; let me see if I can explain why,... > > Basically, what you are asking, in pseudo code, is the following - > > ``` > for row in cypher_array > { > for a in Person > { > for b in Person > { > if (a.StartId == row.StartId && b.EndId == row.EndId) > { > create ... > } > } > } > } > ``` > > The problem with this query is that it amounts to a function with a runtime of **O(m*n^2)**. As you can see, as **n** (the size of Persons gets larger the amount of processing is squared and then multiplied by **m**, the number of rows in **cypher_array**. While this is fine for smaller datasets, it isn't for larger ones. > > Edit: I'm not sure if indexes can help much here, but I'm not an expert. Yes, I do understand that this is most likely the case, however when we were trying the query without `cypher_array` and/or `UNWIND`, we were executing either single `CREATE` or batched `CREATE` (of size 100) queries in one go, and we observed that the performance was definitely slower, even with indexing. How would you recommend us to modify our query to improve the execution time for such large datasets, specifically for edge creations? For vertex graph creation, this process was fairly fast, we only observe this performance issue with the edge creation process, and we cannot use the csv file load method. Thank you for any advice you have for us. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@age.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org