[
https://issues.apache.org/jira/browse/IGNITE-22757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pavel Pereslegin updated IGNITE-22757:
--------------------------------------
Fix Version/s: 3.0.0-beta2
> Excessive memory usage in schema-related code in SQL
> ----------------------------------------------------
>
> Key: IGNITE-22757
> URL: https://issues.apache.org/jira/browse/IGNITE-22757
> Project: Ignite
> Issue Type: Bug
> Components: sql
> Reporter: Roman Puchkovskiy
> Assignee: Maksim Zhuravkov
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-beta2
>
> Attachments: image-2024-07-17-14-13-29-683.png
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> I have the following test:
>
> {noformat}
> class It1000TablesTest extends ClusterPerTestIntegrationTest {
> private static final String DEFAULT_STORAGE_ENGINE = "<default>";
> private final String storageEngine = "aipersist";
> private final int columnsCount = 200;
> @Override
> protected int initialNodes() {
> return 1;
> }
> @Test
> void test() {
> String storageProfile =
> DEFAULT_STORAGE_ENGINE.equals(storageEngine) ?
> DEFAULT_STORAGE_PROFILE : "default_" + storageEngine.toLowerCase();
> String zoneSql = "create zone test_zone with replicas=1,
> storage_profiles='" + storageProfile + "';";
> cluster.doInSession(0, session -> {
> executeUpdate(zoneSql, session);
> });
> for (int i = 0; i < 1000; i++) {
> String tableName = String.format("table%03d", i);
> String valColumns = columnNames()
> .map(colName -> colName + " varchar(40)")
> .collect(joining(", "));
> String tableSql = "create table " + tableName + " (key int
> primary key, " + valColumns + ")"
> + " with primary_zone='TEST_ZONE', storage_profile='" +
> storageProfile + "';";
> String columnNames = columnNames().collect(joining(", "));
> String values = IntStream.range(0, columnsCount)
> .mapToObj(n -> UUID.randomUUID().toString())
> .map(s -> "'" + s + "'")
> .collect(joining(", "));
> String insertSql = "insert into " + tableName + " (key, " +
> columnNames + ") values (" + i + ", " + values + ")";
> cluster.doInSession(0, session -> {
> executeUpdate(tableSql, session);
> executeUpdate(insertSql, session);
> });
> int createdTables = i + 1;
> if (createdTables % 1 == 0) {
> log.info("XXX Created " + createdTables + " tables");
> }
> }
> }
> private Stream<String> columnNames() {
> return IntStream.range(0, columnsCount)
> .mapToObj(n -> String.format("val%03d", n));
> }
> }
> {noformat}
> It just tries to create a 1000 of tables, 201 column each (sharing the same
> zone), making an insert to each of them after creating it.
>
> After creating about 200 tables I took a heap dump, here are the top
> consumers of the heap:
> !image-2024-07-17-14-13-29-683.png!
> There are just around tables, but 20k IgniteTableImpl instances and more than
> 4M CatalogColumnDescriptor instances. It feels like an arithmetic
> progression: sum of 1..200 gives (1+200)*100≈20000, as if addition of a new
> table made a copy of all existing tables as well.
> SqlSchemaManager caches tables by <tableId, catalogVersion> pair, so, if a
> catalog is modified N times in a way that does not concern a table, N copies
> of the table might be created in the cache (and they do get created). It
> seems natural to cache tables (maybe additionally to the existing caching) by
> <tableId, tableVersion>.
> Another problem is that, as the cache is bounded, it starts forgetting older
> instances, so they get recreated; but those older instances are actually used
> by some internal structures of Calcite, so, even if tables are properly
> cached by <tableId, tableVersion>, duplicates will emerge when anough tables
> are created.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)