lokesh-lingarajan-0310 commented on code in PR #9881:
URL: https://github.com/apache/hudi/pull/9881#discussion_r1396339333
##########
website/docs/schema_evolution.md:
##########
@@ -22,21 +22,36 @@ the previous schema (e.g., renaming a column).
Furthermore, the evolved schema is queryable across high-performance engines
like Presto and Spark SQL without additional overhead for column ID
translations or
type reconciliations. The following table summarizes the schema changes
compatible with different Hudi table types.
-| Schema Change
| COW | MOR | Remarks
|
-|:---------------------------------------------------------------------------------|:---------|:--------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Add a new nullable column at root level at the end
| Yes | Yes | `Yes` means that a write with evolved schema
succeeds and a read following the write succeeds to read entire dataset.
|
-| Add a new nullable column to inner struct (at the end)
| Yes | Yes |
-| Add a new complex type field with default (map and array)
| Yes | Yes |
|
-| Add a new nullable column and change the ordering of fields
| No | No | Write succeeds but read fails if the write with
evolved schema updated only some of the base files but not all. Currently, Hudi
does not maintain a schema registry with history of changes across base files.
Nevertheless, if the upsert touched all base files then the read will succeed. |
-| Add a custom nullable Hudi meta column, e.g. `_hoodie_meta_col`
| Yes | Yes |
|
-| Promote datatype from `int` to `long` for a field at root level
| Yes | Yes | For other types, Hudi supports promotion as
specified in [Avro schema
resolution](http://avro.apache.org/docs/current/spec#Schema+Resolution).
|
-| Promote datatype from `int` to `long` for a nested field
| Yes | Yes |
-| Promote datatype from `int` to `long` for a complex type (value of map or
array) | Yes | Yes |
|
-| Add a new non-nullable column at root level at the end
| No | No | In case of MOR table with Spark data source, write
succeeds but read fails. As a **workaround**, you can make the field nullable.
|
-| Add a new non-nullable column to inner struct (at the end)
| No | No |
|
-| Change datatype from `long` to `int` for a nested field
| No | No |
|
-| Change datatype from `long` to `int` for a complex type (value of map or
array) | No | No |
|
-
+The incoming schema will automatically have missing columns from the table
schema added with null value
Review Comment:
how about the following
The incoming schema will automatically have missing columns added with null
values from the table schema.
For this we need to enable the following config
`hoodie.write.handle.missing.cols.with.lossless.type.promotion`, otherwise
the pipeline will fail. Note: This particular config will also do best effort
to solve some of the backward incompatible
type promotions eg., 'long' to 'int'.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]