ysvoon opened a new issue, #2198: URL: https://github.com/apache/age/issues/2198
### Questions: 1. How can I create index for graph tables? 2. How effective are they? I am using a gin index right now on properties, and I can see that the performance does improve for the initial 200k edge creations, but the performance starts to degrade severely after that, does creating indexes on specifically the key that I am using in the MATCH clause of my edge creation query help? 3. I have also tried to skip to the 200k offset and start my edge creations from this point, but the performance is still poor for the rows after the 200k mark onwards, does this mean there is a limitation to indexes? How can I insert 8 million edges to my table within a reasonable time period? ### What we have tried: #### Question 1 I have referred to Issue https://github.com/apache/age/issues/2137 to create btree and gin indexes, so far only this works for me: ``` CREATE INDEX index_name ON schema_name."table_name" USING gin (properties); ``` and the following returns syntax errors: ``` # trying to create gin index on more specific key within properties that I am using in the MATCH clause for edge creation query CREATE INDEX IF NOT EXISTS index_name ON schema_name."table_name" USING gin ((properties -> 'some_id')); # trying to create btree index in general CREATE INDEX idx_btree_name ON test."NodeType1" USING btree (agtype_access_operator(VARIADIC ARRAY[properties, '"name"'::agtype])); ``` error: ``` ERROR: operator does not exist: ag_catalog.agtype -> unknown ERROR: type "agtype" does not exist ``` --- #### Question 2 & 3 Our query looks something like this: ``` SELECT * FROM cypher('schema_name', $$ WITH {cypher_array} AS rows UNWIND rows AS row MATCH (a:Person {PersonId: row.StartId}), (b:Person {PersonId: row.EndId}) CREATE (a)-[e:{edge_name} {{ StartId: row.StartId, EndId: row.EndId, otherprops: row.other_props }}]->(b) $$) AS (e agtype); ``` and we are doing one insertion for every 100 rows, for an offset of 100k rows from the original graph table that has 8 million rows. Initially it would take under 2-3s for every 100 rows, but starting from the 200k offset, the time would increase up to 100 - 1000s for every 100 rows. The query plan tells me it seems to be using the index I've created `Bitmap Index Scan on idx_something...`, and I am at a loss as to what other things I can try to improve the performance of the edge creation queries. If you could help us understand this, it would be a great help, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@age.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org