[I] Performance Issue in Edge Creation for Large Tables and Index Creation [age]

via GitHub Mon, 04 Aug 2025 02:31:12 -0700


ysvoon opened a new issue, #2198:
URL: https://github.com/apache/age/issues/2198


   ### Questions:
   1. How can I create index for graph tables?
   2. How effective are they? I am using a gin index right now on properties, 
and I can see that the performance does improve for the initial 200k edge 
creations, but the performance starts to degrade severely after that, does 
creating indexes on specifically the key that I am using in the MATCH clause of 
my edge creation query help?
   3. I have also tried to skip to the 200k offset and start my edge creations 
from this point, but the performance is still poor for the rows after the 200k 
mark onwards, does this mean there is a limitation to indexes? How can I insert 
8 million edges to my table within a reasonable time period?
   
   ### What we have tried:
   #### Question 1
   I have referred to Issue https://github.com/apache/age/issues/2137 to create 
btree and gin indexes, so far only this works for me:
   
   ```
   CREATE INDEX index_name ON schema_name."table_name" USING gin (properties);
   ```
   
   and the following returns syntax errors:
   
   ```
   # trying to create gin index on more specific key within properties that I 
am using in the MATCH clause for edge creation query
   CREATE INDEX IF NOT EXISTS index_name
   ON schema_name."table_name" USING gin ((properties -> 'some_id'));
   
   # trying to create btree index in general
   CREATE INDEX idx_btree_name
   ON test."NodeType1"
   USING btree (agtype_access_operator(VARIADIC ARRAY[properties, 
'"name"'::agtype]));
   ```
   
   error:
   ```
   ERROR:  operator does not exist: ag_catalog.agtype -> unknown
   
   ERROR:  type "agtype" does not exist
   ```
   ---
   
   #### Question 2 & 3
   Our query looks something like this:
   ```
   SELECT * FROM cypher('schema_name', $$
                       WITH {cypher_array} AS rows
                       UNWIND rows AS row
                       MATCH (a:Person {PersonId: row.StartId}),
                             (b:Person {PersonId: row.EndId})
                       CREATE (a)-[e:{edge_name} {{
                           StartId: row.StartId,
                           EndId: row.EndId,
                           otherprops: row.other_props
                       }}]->(b)
                   $$) AS (e agtype);
   ```
   
   and we are doing one insertion for every 100 rows, for an offset of 100k 
rows from the original graph table that has 8 million rows. Initially it would 
take under 2-3s for every 100 rows, but starting from the 200k offset, the time 
would increase up to 100 - 1000s for every 100 rows.
   
   The query plan tells me it seems to be using the index I've created `Bitmap 
Index Scan on idx_something...`, and I am at a loss as to what other things I 
can try to improve the performance of the edge creation queries.
   
   If you could help us understand this, it would be a great help, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Performance Issue in Edge Creation for Large Tables and Index Creation [age]

Reply via email to