WendelLana commented on issue #1004: URL: https://github.com/apache/age/issues/1004#issuecomment-1701107227
## Benchmarking of label IDs in an array with multi-labels In this benchmarking, I conducted tests to assess the performance of a vertex table with an integer column representing a single label for vertices, comparing it against a vertex table with an integer array representing the multi-label feature. Additionally, I utilized an index for the integer array to enhance performance. ### Setup **Creation of tables** ```sql CREATE TABLE labels_test.ag_label_vertex ( id bigint NOT NULL, label_id INT4, PRIMARY KEY (id) ); CREATE TABLE labels_test.ag_label_vertex_array ( id bigint NOT NULL, label_ids INT4[], PRIMARY KEY (id) ); ``` **Populating the tables** - Created 1000000 (one million) vertices (rows) for each table. - The label_id array can contain 0 to 5 label IDs. - Label IDs range from 1 to 10. **Table 1 sample - ag_label_vertex** ``` SELECT * FROM labels_test.ag_label_vertex LIMIT 10; id | label_id ----+---------- 1 | 10 2 | 1 3 | 10 4 | 2 5 | 1 6 | 10 7 | 3 8 | 4 9 | 2 10 | 5 (10 rows) ``` **Table 2 sample - ag_label_vertex_array** ``` SELECT * FROM labels_test.ag_label_vertex_array LIMIT 10; id | label_ids ----+-------------- 1 | {2,9,4,10,2} 2 | {2,3} 3 | {9,4,7} 4 | {2,10,5} 5 | {8,10,6} 6 | {6} 7 | {9} 8 | {1,7} 9 | {} 10 | {} (10 rows) ``` ### Queries that are tested **Query 1 - MATCH (:1)** ```sql EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM labels_test.ag_label_vertex WHERE label_id = 1; EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM labels_test.ag_label_vertex_array WHERE '{1}'::int4[] <@ label_ids; ``` **Query 2 - MATCH (:1|2)** ```sql EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM labels_test.ag_label_vertex WHERE label_id = 1 OR label_id = 2; EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM labels_test.ag_label_vertex_array WHERE '{1}'::int4[] <@ label_ids OR '{2}'::int4[] <@ label_ids; ``` **Query 3 - MATCH (:!1)** ```sql EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM labels_test.ag_label_vertex WHERE label_id != 1; EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM labels_test.ag_label_vertex_array WHERE NOT '{1}'::int4[] <@ label_ids; ``` `From here on, there will only be queries for the multi-label array table.` **Query 4 - MATCH (:1:2)** ```sql EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM labels_test.ag_label_vertex_array WHERE '{1,2}'::int4[] <@ label_ids; ``` **Query 5 - MATCH (:1&2)** ```sql EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM labels_test.ag_label_vertex_array WHERE label_ids = '{1,2}'::int4[]; ``` ### Indices I've created a table to test the queries with GIN index. ```sql CREATE INDEX idx_label_array ON labels_test.ag_label_vertex_array_gin USING gin(label_ids); ``` ### Performance tests Execution time of each query for the table ag_label_vertex (Integer Column) and ag_label_vertex_array (Array Column), also a test with GIN index. The first table shows the best and worst execution times obtained from executing the queries, while the second table displays the average execution time for each query. <table> <thead> <tr> <th></th> <th>Integer Column</th> <th>Array Column</th> <th>Array with Index</th> </tr> </thead> <tbody> <tr> <td rowspan=2>Query 1</td> <td> 85.516 ms </td> <td> 207.132 ms </td> <td> 61.038 ms </td> </tr> <tr> <td> 235.642 ms </td> <td> 385.092 ms </td> <td> 211.111 ms </td> </tr> <tr> <td rowspan=2>Query 2</td> <td> 95.306 ms </td> <td> 299.130 ms </td> <td> 97.431 ms </td> </tr> <tr> <td> 216.760 ms </td> <td> 417.860 ms </td> <td> 270.313 ms </td> </tr> <tr> <td rowspan=2>Query 3</td> <td> 247.242 ms </td> <td> 246.839 ms </td> <td> 247.569 ms </td> </tr> <tr> <td> 280.002 ms </td> <td> 369.022 ms </td> <td> 407.235 ms </td> </tr> <tr> <td rowspan=2>Query 4</td> <td> ------------- </td> <td> 119.549 ms </td> <td> 40.732 ms </td> </tr> <tr> <td> ------------- </td> <td> 277.031 ms </td> <td> 225.804 ms </td> </tr> <tr> <td rowspan=2>Query 5</td> <td> ------------- </td> <td> 84.104 ms </td> <td> 43.549 ms </td> </tr> <tr> <td> ------------- </td> <td> 312.020 ms </td> <td> 255.996 ms </td> </tr> </tbody> </table> **Average Execution Time** | | Integer Column | Array Column | Array with Index | |---|---|---|---| | Query 1 | 150.1548 ms | 247.0602 ms | 94.0968 ms | | Query 2 | 123.0912 ms | 324.2656 ms | 134.56 ms | | Query 3 | 256.2982 ms | 273.6316 ms | 292.5652 ms | | Query 4 | ------------- | 157.3674 ms | 79.4852 ms | | Query 5 | ------------- | 135.9024 ms | 90.8026 ms | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@age.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org