Re: [PR] [#9757][followup] fix(clickhouse-catalog): fix creating distribute table error. [gravitino]

via GitHub Tue, 03 Mar 2026 05:23:06 -0800


Copilot commented on code in PR #10088:
URL: https://github.com/apache/gravitino/pull/10088#discussion_r2878249536



##########
docs/jdbc-clickhouse-catalog.md:
##########
@@ -191,6 +191,7 @@ Other ClickHouse types are exposed as [External 
Type](./manage-relational-metada
 :::note
 - `settings.*` keys are passed to the ClickHouse `SETTINGS` clause verbatim.  
 - The `engine` value is immutable after creation.
+- When loading the table meta, Gravitino can't distinguish whether it's a 
cluster table or a local table as properties like `cluster-name` and 
`on-cluster` can't be fetched from the JDBC metadata currently. 

Review Comment:
   In this note, “table meta” should be “table metadata”, and consider using 
“cannot” instead of “can't” to keep docs tone consistent and formal. The 
sentence could also be tightened for readability (it’s currently a bit 
long/run-on).
   ```suggestion
   - When loading table metadata, Gravitino cannot determine whether it is a 
cluster table or a local table, because properties such as `cluster-name` and 
`on-cluster` are not available from the JDBC metadata. 
   ```



##########
catalogs-contrib/catalog-jdbc-clickhouse/src/main/java/org/apache/gravitino/catalog/clickhouse/operations/ClickHouseTableOperations.java:
##########
@@ -343,22 +345,20 @@ private void handleDistributeTable(
     Preconditions.checkArgument(
         StringUtils.isNotBlank(shardingKey), "Sharding key must be specified 
for Distributed");
 
-    List<String> shardingColumns = 
ClickHouseTableSqlUtils.extractShardingKeyColumns(shardingKey);
-    if (CollectionUtils.isNotEmpty(shardingColumns)) {
-      for (String columnName : shardingColumns) {
-        JdbcColumn shardingColumn = findColumn(columns, columnName);
-        Preconditions.checkArgument(
-            shardingColumn != null,
-            "Sharding key column %s must be defined in the table",
-            columnName);
-      }
-    }
+    // When creating a distributed table, the columns should be empty as the 
distributed table will
+    // reference the remote table's columns, and the sharding key must be 
defined in the columns.
+    Preconditions.checkArgument(
+        ArrayUtils.isEmpty(columns),
+        "Columns should be empty when creating a distributed table, but got: 
%s",
+        Arrays.toString(columns));

Review Comment:
   The inline comment is contradictory: it says the sharding key must be 
defined in the columns, but this method now enforces `columns` to be empty for 
distributed tables. Please update/remove that part of the comment so it matches 
the new behavior.



##########
catalogs-contrib/catalog-jdbc-clickhouse/src/main/java/org/apache/gravitino/catalog/clickhouse/operations/ClickHouseTableOperations.java:
##########
@@ -343,22 +345,20 @@ private void handleDistributeTable(
     Preconditions.checkArgument(
         StringUtils.isNotBlank(shardingKey), "Sharding key must be specified 
for Distributed");
 
-    List<String> shardingColumns = 
ClickHouseTableSqlUtils.extractShardingKeyColumns(shardingKey);
-    if (CollectionUtils.isNotEmpty(shardingColumns)) {
-      for (String columnName : shardingColumns) {
-        JdbcColumn shardingColumn = findColumn(columns, columnName);
-        Preconditions.checkArgument(
-            shardingColumn != null,
-            "Sharding key column %s must be defined in the table",
-            columnName);
-      }
-    }
+    // When creating a distributed table, the columns should be empty as the 
distributed table will
+    // reference the remote table's columns, and the sharding key must be 
defined in the columns.
+    Preconditions.checkArgument(
+        ArrayUtils.isEmpty(columns),
+        "Columns should be empty when creating a distributed table, but got: 
%s",
+        Arrays.toString(columns));
 

Review Comment:
   Enforcing distributed-table creation to require an empty `columns` array 
means `JdbcCatalogOperations.createTable(...)` will return a `Table` with an 
empty schema (it returns the input `columns` without re-loading from the 
database). This is a behavior change for API consumers and can be surprising. 
Consider either (a) allowing columns to be provided (and just skipping index/PK 
generation for Distributed), or (b) making the catalog operation load the 
created table from ClickHouse when engine=Distributed so the returned `Table` 
has the real columns.
   ```suggestion
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [#9757][followup] fix(clickhouse-catalog): fix creating distribute table error. [gravitino]

Reply via email to