[jira] [Updated] (IGNITE-22757) Excessive memory usage in schema-related code in SQL

Pavel Pereslegin (Jira) Wed, 07 Aug 2024 06:51:09 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-22757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Pavel Pereslegin updated IGNITE-22757:
--------------------------------------
    Fix Version/s: 3.0.0-beta2

> Excessive memory usage in schema-related code in SQL
> ----------------------------------------------------
>
>                 Key: IGNITE-22757
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22757
>             Project: Ignite
>          Issue Type: Bug
>          Components: sql
>            Reporter: Roman Puchkovskiy
>            Assignee: Maksim Zhuravkov
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>         Attachments: image-2024-07-17-14-13-29-683.png
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I have the following test:
>  
> {noformat}
> class It1000TablesTest extends ClusterPerTestIntegrationTest {
>     private static final String DEFAULT_STORAGE_ENGINE = "<default>";
>     private final String storageEngine = "aipersist";
>     private final int columnsCount = 200;
>     @Override
>     protected int initialNodes() {
>         return 1;
>     }
>     @Test
>     void test() {
>         String storageProfile =
>                 DEFAULT_STORAGE_ENGINE.equals(storageEngine) ? 
> DEFAULT_STORAGE_PROFILE : "default_" + storageEngine.toLowerCase();
>         String zoneSql = "create zone test_zone with replicas=1, 
> storage_profiles='" + storageProfile + "';";
>         cluster.doInSession(0, session -> {
>             executeUpdate(zoneSql, session);
>         });
>         for (int i = 0; i < 1000; i++) {
>             String tableName = String.format("table%03d", i);
>             String valColumns = columnNames()
>                     .map(colName -> colName + " varchar(40)")
>                     .collect(joining(", "));
>             String tableSql = "create table " + tableName + " (key int 
> primary key, " + valColumns + ")"
>                     + " with primary_zone='TEST_ZONE', storage_profile='" + 
> storageProfile + "';";
>             String columnNames = columnNames().collect(joining(", "));
>             String values = IntStream.range(0, columnsCount)
>                     .mapToObj(n -> UUID.randomUUID().toString())
>                     .map(s -> "'" + s + "'")
>                     .collect(joining(", "));
>             String insertSql = "insert into " + tableName + " (key, " + 
> columnNames + ") values (" + i + ", " + values + ")";
>             cluster.doInSession(0, session -> {
>                 executeUpdate(tableSql, session);
>                 executeUpdate(insertSql, session);
>             });
>             int createdTables = i + 1;
>             if (createdTables % 1 == 0) {
>                 log.info("XXX Created " + createdTables + " tables");
>             }
>         }
>     }
>     private Stream<String> columnNames() {
>         return IntStream.range(0, columnsCount)
>                 .mapToObj(n -> String.format("val%03d", n));
>     }
> }
> {noformat}
> It just tries to create a 1000 of tables, 201 column each (sharing the same 
> zone), making an insert to each of them after creating it.
>  
> After creating about 200 tables I took a heap dump, here are the top 
> consumers of the heap:
> !image-2024-07-17-14-13-29-683.png!
> There are just around tables, but 20k IgniteTableImpl instances and more than 
> 4M CatalogColumnDescriptor instances. It feels like an arithmetic 
> progression: sum of 1..200 gives (1+200)*100≈20000, as if addition of a new 
> table made a copy of all existing tables as well.
> SqlSchemaManager caches tables by <tableId, catalogVersion> pair, so, if a 
> catalog is modified N times in a way that does not concern a table, N copies 
> of the table might be created in the cache (and they do get created). It 
> seems natural to cache tables (maybe additionally to the existing caching) by 
> <tableId, tableVersion>.
> Another problem is that, as the cache is bounded, it starts forgetting older 
> instances, so they get recreated; but those older instances are actually used 
> by some internal structures of Calcite, so, even if tables are properly 
> cached by <tableId, tableVersion>, duplicates will emerge when anough tables 
> are created.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22757) Excessive memory usage in schema-related code in SQL

Reply via email to