sandy-bes opened a new pull request, #2351: URL: https://github.com/apache/age/pull/2351
Motivation / Problem: As a result of load testing, a significant performance degradation was found in insertion scenarios. The scenarios used were taken from an open-source benchmark and rewritten in pure SQL. Examples of the queries can be found here: 1) https://github.com/ldbc/ldbc_snb_interactive_v1_impls/blob/main/cypher/queries/interactive-update-1.cypher 2) https://github.com/ldbc/ldbc_snb_interactive_v1_impls/blob/main/cypher/queries/interactive-update-6.cypher 3) https://github.com/ldbc/ldbc_snb_interactive_v1_impls/blob/main/cypher/queries/interactive-update-7.cypher Analysis showed that the main bottleneck is the entity_exists function. The root cause lies in the use of a Sequential Scan (SeqScan) to check for the existence of an entity prior to insertion. The time complexity of a `SeqScan` is O(N), meaning the search time grows linearly as the number of rows in the table increases. The larger the graph became, the longer each individual insertion took. This led to a drop in TPS regardless of the concurrency level (the issue was consistently reproduced with both 1 and 30 threads). Changes Made: - Added Index Scan (IndexScan / time complexity O(log N)) inside the entity_exists function. - Refactored other functions utilizing SeqScan — they were also migrated to use IndexScan wherever applicable. Performance Impact: Benchmarks were conducted on a server with 30 CPU cores and 32 GB of RAM, using a graph ranging from 20,000 to 200,000 objects over a 2-minute duration. The transition to index access completely eliminated the performance degradation associated with data volume growth: - Before: ~1,500 TPS (at peak, with subsequent degradation as the table grew). - After: Stable ~15,000 TPS (a 10x speedup). Acknowledgments: - Huge thanks to Daria Barsukova for conducting the load testing and isolating the issue. - Implementation of index scanning: Alexandra Bondar. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
