xaoyun opened a new issue, #4240: URL: https://github.com/apache/amoro/issues/4240
### Search before asking - [x] I have searched in the [issues](https://github.com/apache/amoro/issues?q=is%3Aissue) and found no similar issues. ### What would you like to be improved? ## 正文草稿 ```markdown Hi Amoro community, We found two compatibility issues between Amoro Iceberg REST Catalog and Trino Iceberg connector. These issues make Trino DDL unreliable: `CREATE TABLE` may return an error to the client, while the table is actually created successfully in Amoro and can be queried afterward. ## Environment - Amoro: 0.6.1 - Object storage: MinIO / S3-compatible storage - Query engine: Trino - Catalog type: Iceberg REST Catalog - Amoro REST Catalog URI: ```properties iceberg.rest-catalog.uri=http://<amoro-host>:1630/api/iceberg/rest iceberg.rest-catalog.warehouse=lakehouse_catalog ``` Trino catalog config example: ```properties connector.name=iceberg iceberg.catalog.type=rest iceberg.rest-catalog.uri=http://<amoro-host>:1630/api/iceberg/rest iceberg.rest-catalog.warehouse=lakehouse_catalog iceberg.unique-table-location=true iceberg.file-format=PARQUET fs.native-s3.enabled=true s3.endpoint=http://<minio-host>:9000 s3.region=us-east-1 s3.path-style-access=true s3.aws-access-key=*** s3.aws-secret-key=*** ``` ## Issue 1: namespace properties are not preserved or returned When creating a namespace with `location`: ```sql CREATE SCHEMA iceberg_minio.dwd_stage_v2 WITH ( location = 's3://enerpoch-pub-bucket/warehouse/dwd_stage_v2' ); ``` or calling the Iceberg REST API directly: ```bash curl -s -X POST \ "http://<amoro-host>:1630/api/iceberg/rest/v1/catalogs/lakehouse_catalog/namespaces" \ -H "Content-Type: application/json" \ -d '{ "namespace": ["diag_stage_loc"], "properties": { "location": "s3://enerpoch-pub-bucket/warehouse/diag_stage_loc" } }' ``` Amoro returns: ```json { "namespace": ["diag_stage_loc"], "properties": {} } ``` Loading the namespace also returns empty properties: ```bash curl -s \ "http://<amoro-host>:1630/api/iceberg/rest/v1/catalogs/lakehouse_catalog/namespaces/diag_stage_loc" ``` Response: ```json { "namespace": ["diag_stage_loc"], "properties": {} } ``` Because of this, Trino cannot create a table without an explicit table-level location: ```sql CREATE TABLE iceberg_minio.diag_stage_loc.test_tbl ( id BIGINT ) WITH ( format = 'PARQUET' ); ``` Trino reports: ```text location must be set for diag_stage_loc ``` Expected behavior: Amoro should preserve and return namespace properties, at least the `location` property: ```json { "namespace": ["diag_stage_loc"], "properties": { "location": "s3://enerpoch-pub-bucket/warehouse/diag_stage_loc" } } ``` ## Issue 2: CREATE TABLE with explicit location returns non-empty location error, but the table is created successfully When creating a table with an explicit location: ```sql CREATE TABLE iceberg_minio.dwd_stage.aaa ( id BIGINT, name VARCHAR, create_time TIMESTAMP(6) ) WITH ( format = 'PARQUET', location = 's3://enerpoch-pub-bucket/warehouse/dwd_stage/aaa' ); ``` Trino reports: ```text Cannot create a table on a non-empty location: s3://enerpoch-pub-bucket/warehouse/dwd_stage/aaa, set 'iceberg.unique-table-location=true' in your Iceberg catalog properties to use unique table locations for every table. ``` However, the table is actually created successfully: - The table exists in Amoro. - Iceberg metadata files exist in object storage. - Trino can query the table afterward. So the DDL result seen by the client is failure, while the server-side result is success. ## Source code analysis ### Amoro 0.6.1 In Amoro 0.6.1, `IcebergRestCatalogService#createTable` creates Iceberg metadata and then registers the table in AMS. Related files: ```text ams/server/src/main/java/com/netease/arctic/server/IcebergRestCatalogService.java ams/server/src/main/java/com/netease/arctic/server/utils/IcebergTableUtil.java ``` The create table flow writes the initial metadata file: ```java OutputFile outputFile = io.newOutputFile(metadataFileLocation); TableMetadataParser.overwrite(icebergTableMetadata, outputFile); ``` The generated metadata path is under the final table location: ```text <table-location>/metadata/v00001-xxx.metadata.json ``` After the REST call returns, Trino checks whether the table location is empty. Since Amoro has already written the metadata file, Trino sees the location as non-empty and throws the error. ### Higher versions checked We also checked the official Apache Amoro source code for: - `v0.8.1-incubating` - `v0.9.0-rc3` - current `master` The namespace properties issue still exists. `setNamespaceProperties` still throws: ```java throw new UnsupportedOperationException("namespace properties is not supported"); ``` The create table path was refactored to `InternalTableCreator`, but the behavior is still similar. Related file: ```text amoro-ams/src/main/java/org/apache/amoro/server/table/internal/InternalIcebergCreator.java ``` The initial metadata is still written to the final table location: ```java String icebergMetadataFileLocation = InternalTableUtil.genNewMetadataFileLocation(null, icebergMetadata); OutputFile outputFile = io.newOutputFile(icebergMetadataFileLocation); TableMetadataParser.overwrite(icebergMetadata, outputFile); ``` We did not find special handling for: ```java request.stageCreate() ``` Therefore, upgrading from 0.6.1 to 0.8.1 or 0.9.0-rc3 does not appear to resolve this Trino compatibility issue. ## Why this seems better fixed in Amoro Trino's non-empty location check is a safety check to avoid creating a table on top of existing files or existing Iceberg metadata. The problem is that Amoro writes metadata into the final table location before Trino finishes its create table flow. Therefore, relaxing Trino's check would reduce safety for all catalogs, while fixing Amoro would make the REST Catalog behavior more compatible and predictable. ### How should we improve? ## Suggested improvements Could Amoro consider supporting the following behavior? 1. Preserve namespace properties passed by `CreateNamespaceRequest.properties()`. 2. Return namespace properties from the load namespace endpoint. 3. Implement the namespace properties update endpoint. 4. Respect Iceberg REST `CreateTableRequest.stageCreate()` semantics. 5. Avoid writing initial metadata into the final table location during staged create. 6. Ensure create table is transactional or at least idempotent from the REST client perspective. Expected result: - `CREATE SCHEMA ... WITH (location=...)` should preserve `location`. - `CREATE TABLE` without explicit table-level location should work when namespace has a location. - `CREATE TABLE ... WITH (location=...)` should not return a failure after the table has already been created successfully. Thanks. ``` ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Subtasks _No response_ ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
