FANNG1 opened a new issue, #10367:
URL: https://github.com/apache/gravitino/issues/10367
Spark Paimon currently does not expose distribution support in connector
capability and docs, but the core path already has most of the building blocks.
Background:
- Spark Paimon IT disables clustered-by tests:
-
/Users/fanng/opensource/gravitino/spark-connector/spark-common/src/test/java/org/apache/gravitino/spark/connector/integration/test/paimon/SparkPaimonCatalogIT.java:47
- Spark connector already converts bucket transforms to Gravitino
distribution and passes it into createTable:
-
/Users/fanng/opensource/gravitino/spark-connector/spark-common/src/main/java/org/apache/gravitino/spark/connector/SparkTransformConverter.java:140
-
/Users/fanng/opensource/gravitino/spark-connector/spark-common/src/main/java/org/apache/gravitino/spark/connector/catalog/BaseCatalog.java:209
- Paimon backend already validates and applies HASH distribution:
-
/Users/fanng/opensource/gravitino/catalogs/catalog-lakehouse-paimon/src/main/java/org/apache/gravitino/catalog/lakehouse/paimon/PaimonCatalogOperations.java:505
- Current doc says Spark Paimon does not support distribution:
-
/Users/fanng/opensource/gravitino/docs/spark-connector/spark-catalog-paimon.md:25
Suggested SQL for validation:
```sql
USE paimon_catalog.default;
CREATE TABLE dist_tbl (
id INT,
name STRING,
address STRING
) USING paimon
CLUSTERED BY (id) INTO 4 BUCKETS;
DESC TABLE EXTENDED dist_tbl;
SHOW TBLPROPERTIES dist_tbl;
```
How should we improve?
- Enable distribution capability for Spark Paimon and add integration
coverage.
- Ensure `CLUSTERED BY (...) INTO N BUCKETS` is persisted as Paimon bucket
metadata.
- Add/enable ITs to verify table metadata after create/load.
- Update Spark Paimon docs to mark distribution as supported.
Out of scope:
- Sort orders (`SORTED BY`) in this issue.
- Row-level operations (`DELETE/UPDATE/MERGE`) in this issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]