[
https://issues.apache.org/jira/browse/HUDI-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Kudinkin updated HUDI-4872:
----------------------------------
Epic Link: HUDI-1297
> Support automatic schema evolution for SQL MERGE INTO with UPDATE */ INESRT *
> -----------------------------------------------------------------------------
>
> Key: HUDI-4872
> URL: https://issues.apache.org/jira/browse/HUDI-4872
> Project: Apache Hudi
> Issue Type: Improvement
> Components: writer-core
> Reporter: kazdy
> Assignee: Alexey Kudinkin
> Priority: Major
> Fix For: 0.13.0
>
>
> I tried using MERGE INTO with UPDATE * and INSERT * statement with full
> schema evolution enabled.
> I noticed that during insert new columns from incoming batch (that do not
> exist in target table yet) are dropped and target schema is applied. No
> warnings nor failed writes.
> Therefore can we as users automatically evolve schema on MERGE INTO
> operations?
> I guess this should only be supported when we use update set * and insert *
> in merge operation.
> *Expected behavior*
> When incoming data is missing columns that already declared in target table
> these should be injected with default/null values.
> When incoming data has new columns that are not yet declared in the target
> table, these should be added to the target table.
> Case when incoming data has both missing columns and new columns, missing
> columns should be injected with null/ default values, new columns should be
> added to the target table.
> New columns should be reflected in metastore table schema.
> Should support complex types, and nested schemas.
> Currently similar thing is supported for dataframe writes if both schema
> reconciliation and schema evolution configs are enabled, see HUDI-4276.
> From user experience perspective it would be easier if I had _mergeSchema_
> (as for parquet spark datasource) config to enable this feature for both
> spark sql and df write.
> Thread from dev mailing list as a reference:
> [https://lists.apache.org/thread/kr59hh7yqr2c1y33kzfv3n97h6ydbz9b]
> GH issue: [https://github.com/apache/hudi/issues/5899]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)