[ 
https://issues.apache.org/jira/browse/HUDI-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Kaźmirski updated HUDI-4276:
-----------------------------------
    Description: 
Improve schema reconciliation to make it more flexible in presence of full 
schema evolution being enabled.

Desired behavior:
 # incoming data has missing columns that were already defined in the table –> 
null values will be injected into missing columns 
 # incoming data contains new columns not defined yet in the table -> columns 
will be added to the table schema (incoming dataframe?)
 # incoming data has missing columns that are already defined in the table and 
new columns not yet defined in the table -> new columns will be added to the 
table schema, missing columns will be injected with null values

No column should be dropped when using hive sync utility when schema 
reconciliation is enabled.

Related GH issue:
[https://github.com/apache/hudi/issues/5873]

 

  was:
Improve schema reconciliation to make it more flexible in presence of full 
schema evolution enabled.



Desired behavior:
 # incoming data has missing columns that were already defined in the table –> 
null values will be injected into missing columns 
 # incoming data contains new columns not defined yet in the table -> columns 
will be added to the table schema (incoming dataframe?)
 # incoming data has missing columns in the table and new columns in the table 
-> new columns will be added to the table schema, missing columns will be 
injected with null values

No column should be dropped when using hive sync utility.

Related GH issue:
[https://github.com/apache/hudi/issues/5873]

 


> Reconcile schema - inject null values for missing fields and add new fields
> ---------------------------------------------------------------------------
>
>                 Key: HUDI-4276
>                 URL: https://issues.apache.org/jira/browse/HUDI-4276
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Daniel Kaźmirski
>            Priority: Minor
>
> Improve schema reconciliation to make it more flexible in presence of full 
> schema evolution being enabled.
> Desired behavior:
>  # incoming data has missing columns that were already defined in the table 
> –> null values will be injected into missing columns 
>  # incoming data contains new columns not defined yet in the table -> columns 
> will be added to the table schema (incoming dataframe?)
>  # incoming data has missing columns that are already defined in the table 
> and new columns not yet defined in the table -> new columns will be added to 
> the table schema, missing columns will be injected with null values
> No column should be dropped when using hive sync utility when schema 
> reconciliation is enabled.
> Related GH issue:
> [https://github.com/apache/hudi/issues/5873]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to