danhuawang opened a new issue, #10443: URL: https://github.com/apache/gravitino/issues/10443
### Version main branch ### Describe what's wrong There's ` "type": "unparsed"` when register delta table including complex datatype to the gravitino <img width="838" height="838" alt="Image" src="https://github.com/user-attachments/assets/24b38285-09ba-4a2a-92ef-8daa8052dd3b" /> ### Error message and/or stacktrace ``` 26/03/16 19:13:31 INFO SparkContext: Running Spark version 3.4.3 26/03/16 19:13:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 26/03/16 19:13:31 INFO ResourceUtils: ============================================================== 26/03/16 19:13:31 INFO ResourceUtils: No custom resources configured for spark.driver. 26/03/16 19:13:31 INFO ResourceUtils: ============================================================== 26/03/16 19:13:31 INFO SparkContext: Submitted application: Delta Table Test 26/03/16 19:13:31 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 26/03/16 19:13:31 INFO ResourceProfile: Limiting resource is cpu 26/03/16 19:13:31 INFO ResourceProfileManager: Added ResourceProfile id: 0 26/03/16 19:13:31 INFO SecurityManager: Changing view acls to: wangdanhua,hdfs 26/03/16 19:13:31 INFO SecurityManager: Changing modify acls to: wangdanhua,hdfs 26/03/16 19:13:31 INFO SecurityManager: Changing view acls groups to: 26/03/16 19:13:31 INFO SecurityManager: Changing modify acls groups to: 26/03/16 19:13:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: wangdanhua, hdfs; groups with view permissions: EMPTY; users with modify permissions: wangdanhua, hdfs; groups with modify permissions: EMPTY 26/03/16 19:13:31 INFO Utils: Successfully started service 'sparkDriver' on port 53352. 26/03/16 19:13:31 INFO SparkEnv: Registering MapOutputTracker 26/03/16 19:13:31 INFO SparkEnv: Registering BlockManagerMaster 26/03/16 19:13:31 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 26/03/16 19:13:31 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 26/03/16 19:13:31 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 26/03/16 19:13:31 INFO DiskBlockManager: Created local directory at /private/var/folders/wn/rgsz3fqj32x87q719rfmfh9r0000gn/T/blockmgr-3f6cb7f5-aea0-47b7-a050-738ff7a0a8da 26/03/16 19:13:31 INFO MemoryStore: MemoryStore started with capacity 127.2 MiB 26/03/16 19:13:31 INFO SparkEnv: Registering OutputCommitCoordinator 26/03/16 19:13:31 INFO Executor: Starting executor ID driver on host wangdanhuadembp 26/03/16 19:13:31 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): '' 26/03/16 19:13:31 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53353. 26/03/16 19:13:31 INFO NettyBlockTransferService: Server created on localhost 127.0.0.1:53353 26/03/16 19:13:31 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 26/03/16 19:13:31 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 53353, None) 26/03/16 19:13:31 INFO BlockManagerMasterEndpoint: Registering block manager localhost:53353 with 127.2 MiB RAM, BlockManagerId(driver, localhost, 53353, None) 26/03/16 19:13:31 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 53353, None) 26/03/16 19:13:31 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 53353, None) Spark session initialized for Delta table operations 26/03/16 19:13:34 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'. Complex datatypes Delta table created at: /tmp/delta-datatypes/complex When create complex datatypes delta table with spark delta_datatype_complex in schema delta_datatype_schema catalog delta_datatype_test_catalog at location /tmp/delta-datatypes/complex # com.datastrato.test.steps.DeltaTableSteps.createComplexDatatypesDeltaTableWithSpark(java.lang.String,java.lang.String,java.lang.String,java.lang.String) Then verify delta table created successfully # com.datastrato.test.steps.DeltaTableSteps.verifyDeltaTableCreatedSuccessfully() Request method: POST Request URI: http://127.0.0.1:18090/api/metalakes/delta_test_metalake/catalogs/delta_datatype_test_catalog/schemas/delta_datatype_schema/tables Proxy: <none> Request params: <none> Query params: <none> Form params: <none> Path params: <none> Headers: Accept=application/vnd.gravitino.v1+json Authorization=Basic YW5vbnltb3VzOnRlc3Q= Content-Type=application/json Cookies: <none> Multiparts: <none> Body: { "columns": [ { "nullable": true, "name": "string_array", "comment": "array of strings", "type": "list<string>" }, { "nullable": true, "name": "string_map", "comment": "map of string to int", "type": "map<string,integer>" }, { "nullable": true, "name": "person_struct", "comment": "person struct", "type": "struct<name:string,age:integer>" } ], "name": "delta_datatype_complex", "comment": "Delta table with complex datatypes", "properties": { "external": "true", "format": "delta", "location": "/tmp/delta-datatypes/complex" } } When register complex datatypes delta table delta_datatype_complex at location /tmp/delta-datatypes/complex in schema delta_datatype_schema catalog delta_datatype_test_catalog # com.datastrato.test.steps.DeltaTableSteps.registerComplexDatatypesDeltaTable(java.lang.String,java.lang.String,java.lang.String,java.lang.String) Then check response code 200 message properties # com.datastrato.test.steps.MetalakeSteps.verifyResponseCodeMessage(int,java.lang.String) When load table delta_datatype_complex in schema delta_datatype_schema catalog delta_datatype_test_catalog # com.datastrato.test.steps.DeltaTableSteps.loadTable(java.lang.String,java.lang.String,java.lang.String) [DEBUG] Column 'string_array' Gravitino raw type JSON: {"type":"unparsed","unparsedType":"list<string>"} [DEBUG] Available Spark tables: +---------+---------+-----------+ |namespace|tableName|isTemporary| +---------+---------+-----------+ +---------+---------+-----------+ [DEBUG] Spark schema for table at /tmp/delta-datatypes/complex: root |-- string_array: array (nullable = true) | |-- element: string (containsNull = true) |-- string_map: map (nullable = true) | |-- key: string | |-- value: integer (valueContainsNull = true) |-- person_struct: struct (nullable = true) | |-- name: string (nullable = true) | |-- age: integer (nullable = true) [DEBUG] Spark column: string_array -> array<string> [DEBUG] >>> Target column 'string_array' Spark type: array<string> (catalogString: array<string>) [DEBUG] Spark column: string_map -> map<string,int> [DEBUG] Spark column: person_struct -> struct<name:string,age:int> Then verify gravitino table complex datatypes mapping: # com.datastrato.test.steps.DeltaTableSteps.verifyGravitinoTableComplexDatatypesMapping(io.cucumber.datatable.DataTable) | Gravitino Type | Delta/Spark Type | Column Name | | list<string> | ArrayType(StringType) | string_array | | map<string,integer> | MapType(String,Integer) | string_map | | struct<name:string,age:integer> | StructType | person_struct | org.opentest4j.AssertionFailedError: Column string_array type mismatch: expected list<string>, got unparsed ==> expected: <true> but was: <false> at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) at com.datastrato.test.steps.DeltaTableSteps.verifyDatatypesMapping(DeltaTableSteps.java:803) at com.datastrato.test.steps.DeltaTableSteps.verifyGravitinoTableComplexDatatypesMapping(DeltaTableSteps.java:766) ``` ### How to reproduce 1. Create a delta table with spark String createTableSQL = String.format( "CREATE TABLE delta.`%s` (" + "string_array ARRAY<STRING>, " + "string_map MAP<STRING,INT>, " + "person_struct STRUCT<name:STRING,age:INT>" + ") USING DELTA", location); sparkSession.sql(createTableSQL); 2. Register the table in Gravitino POST {{host}}/api/metalakes/:metalake/catalogs/:catalog/schemas/:schema/tables ``` { "columns": [ { "nullable": true, "name": "string_array", "comment": "array of strings", "type": "list<string>" }, { "nullable": true, "name": "string_map", "comment": "map of string to int", "type": "map<string,integer>" }, { "nullable": true, "name": "person_struct", "comment": "person struct", "type": "struct<name:string,age:integer>" } ], "name": "delta_datatype_complex", "comment": "Delta table with complex datatypes", "properties": { "external": "true", "format": "delta", "location": "/tmp/delta-datatypes/complex" } } ``` ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
