Armelabdelkbir opened a new issue, #11803:
URL: https://github.com/apache/hudi/issues/11803

   
   **Describe the problem you faced**
   hello i try to test several schema evolution usecases using hudi 0.15 and 
spark3.5 using hms 4
   first test: Adding column in PG --> debezium / schema registry ok --> hudi 
(MOR) hivesyncTool KO 
   ```
   org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:250)
        at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:193)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:167)
        ... 69 more
   Caused by: InvalidOperationException(message:The following columns have 
types incompatible with the existing columns in their respective positions :
   username)
   ```
   2. type promotion following this doc: 
https://hudi.apache.org/docs/schema_evolution/#type-promotions in PG , Double 
to  String ok --> debezium / schema registy ok -->   hudi (MOR) hivesyncTool KO
   `Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Could not convert 
field Type from DOUBLE to string for field salary`
   3. DROP column in PG --> debezium / schema registy ok --> hudi (MOR) 
hivesyncTool job not failed but i see always the column but with null values 
for newest inserts
   my configuration is :
     ```
       "hoodie.datasource.hive_sync.ignore_exceptions" -> "true",
         "hoodie.write.set.null.for.missing.columns" -> "true",
         "hoodie.schema.on.read.enable" -> "true"
         hive.metastore.disallow.incompatible.col.type.changes"-> "false"
   ```
   when i drop tables (ro / rt ) and i restart my job is creating new tables 
correctly, but is not production way to handle schemas evolutions
   
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. on the PG source side:
   `cdc_hudi=> ALTER TABLE employees ADD COLUMN test_str VARCHAR ;
   ALTER TABLE
   cdc_hudi=> INSERT INTO employees (name, department, username, test_str) 
VALUES ('armel011', 'Engineering', 'arm23220', 'teststr');
   INSERT 0 1`
   2.debezium / schema registry ok ( latest version contains added columns)
   
![image](https://github.com/user-attachments/assets/7de3ec08-1a52-4d91-b69c-24f1d5945da4)
   
   3.restart spark job
   
   
   **Expected behavior**
   
   Case 1. add column 
   Case 2. change datatype double to string
   Case 3. drop column 
   
   **Environment Description**
   
   * Hudi version : 0.15.0
   
   * Spark version : 3.5.1
   
   * Hive version : 4.0.0
   
   * Hadoop version : 3.4
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   
   
   **Stacktrace**
   
   ```Caused by: org.apache.hudi.exception.HoodieException: Got runtime 
exception when hive syncing employees
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:170)
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:79)
        ... 68 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to update 
table for employees_ro
        at 
org.apache.hudi.hive.ddl.HMSDDLExecutor.updateTableDefinition(HMSDDLExecutor.java:162)
        at 
org.apache.hudi.hive.HoodieHiveSyncClient.updateTableSchema(HoodieHiveSyncClient.java:205)
        at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:347)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:250)
        at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:193)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:167)
        ... 69 more
   Caused by: InvalidOperationException(message:The following columns have 
types incompatible with the existing columns in their respective positions :
   test_str)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_environment_context_result$alter_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:59744)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_environment_context_result$alter_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:59730)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$alter_table_with_environment_context_result.read(ThriftHiveMetastore.java:59672)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1693)
        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_environment_context(ThriftHiveMetastore.java:1677)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:373)
        at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table_with_environmentContext(SessionHiveMetaStoreClient.java:322)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to