[GitHub] [hudi] kazdy opened a new issue, #5899: [SUPPORT] MERGE INTO with UPDATE / INESRT - new incoming columns dropped, automatic schema evolution feature

GitBox Fri, 17 Jun 2022 16:14:02 -0700


kazdy opened a new issue, #5899:
URL: https://github.com/apache/hudi/issues/5899


   **Describe the problem you faced**
   
   I've tried using MERGE INTO with UPDATE * and INSERT * statement with full 
schema evolution enabled. 
   I've noticed that during insert new columns from incoming (that do not exist 
in target table yet) are dropped and target schema is applied. 
   
   Therefore can we as users automatically evolve schema on MERGE INTO 
operations? 
   I guess this should only be supported when we use update set * and insert * 
in merge operation.
   
   **Expected behavior**
   
   When incoming data is missing columns that already declared in target table 
these should be injected with default/null values.
   When incoming data has new columns that are not yet declared in the target 
table, these should be added to the target table.
   Case when incoming data has both missing columns and new columns, missing 
coluns should be injected with null/ default values, new columns should be 
added to the target table.
   
   New columns should be reflected in metastore table schema.
   
   Would be great to support complex types, and nested schemas.
   
   Thread from dev mailing list as a reference:
   https://lists.apache.org/thread/kr59hh7yqr2c1y33kzfv3n97h6ydbz9b
   
   **Environment Description**
   
   * Hudi version : 0.11 
   
   * Spark version : 3.2.0-amzn
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] kazdy opened a new issue, #5899: [SUPPORT] MERGE INTO with UPDATE */ INESRT * - new incoming columns dropped, automatic schema evolution feature

Reply via email to

[GitHub] [hudi] kazdy opened a new issue, #5899: [SUPPORT] MERGE INTO with UPDATE / INESRT - new incoming columns dropped, automatic schema evolution feature