ysvoon commented on issue #2198:
URL: https://github.com/apache/age/issues/2198#issuecomment-3153047731

   > [@ysvoon](https://github.com/ysvoon) What I can say about your query above 
is that it is not scalable for larger datasets; let me see if I can explain 
why,...
   > 
   > Basically, what you are asking, in pseudo code, is the following -
   > 
   > ```
   > for row in cypher_array
   > {
   >     for a in Person
   >     {
   >         for b in Person
   >         {
   >              if (a.StartId == row.StartId && b.EndId == row.EndId)
   >              {
   >                  create ...
   >              }
   >          }
   >     }
   > }
   > ```
   > 
   > The problem with this query is that it amounts to a function with a 
runtime of **O(m*n^2)**. As you can see, as **n** (the size of Persons gets 
larger the amount of processing is squared and then multiplied by **m**, the 
number of rows in **cypher_array**. While this is fine for smaller datasets, it 
isn't for larger ones.
   > 
   > Edit: I'm not sure if indexes can help much here, but I'm not an expert.
   
   
   Yes, I do understand that this is most likely the case, however when we were 
trying the query without `cypher_array` and/or `UNWIND`, we were executing 
either single `CREATE` or batched `CREATE` (of size 100) queries in one go, and 
we observed that the performance was definitely slower, even with indexing.
   
   How would you recommend us to modify our query to improve the execution time 
for such large datasets, specifically for edge creations? For vertex graph 
creation, this process was fairly fast, we only observe this performance issue 
with the edge creation process, and we cannot use the csv file load method.
   
   Thank you for any advice you have for us. :)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@age.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to