Re: [PR] [HUDI-6958] Simplify Out Of Box Schema Evolution Functionality - DOCS [hudi]

via GitHub Thu, 16 Nov 2023 13:11:49 -0800


lokesh-lingarajan-0310 commented on code in PR #9881:
URL: https://github.com/apache/hudi/pull/9881#discussion_r1396339333



##########
website/docs/schema_evolution.md:
##########
@@ -22,21 +22,36 @@ the previous schema (e.g., renaming a column).
 Furthermore, the evolved schema is queryable across high-performance engines 
like Presto and Spark SQL without additional overhead for column ID 
translations or
 type reconciliations. The following table summarizes the schema changes 
compatible with different Hudi table types.
 
-| Schema Change                                                                
    | COW      | MOR     | Remarks                                              
                                                                                
                                                                                
                                                                         |
-|:---------------------------------------------------------------------------------|:---------|:--------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Add a new nullable column at root level at the end                           
    | Yes      | Yes     | `Yes` means that a write with evolved schema 
succeeds and a read following the write succeeds to read entire dataset.        
                                                                                
                                                                                
 |
-| Add a new nullable column to inner struct (at the end)                       
    | Yes      | Yes     |
-| Add a new complex type field with default (map and array)                    
    | Yes      | Yes     |                                                      
                                                                                
                                                                                
                                                                         |
-| Add a new nullable column and change the ordering of fields                  
    | No       | No      | Write succeeds but read fails if the write with 
evolved schema updated only some of the base files but not all. Currently, Hudi 
does not maintain a schema registry with history of changes across base files. 
Nevertheless, if the upsert touched all base files then the read will succeed. |
-| Add a custom nullable Hudi meta column, e.g. `_hoodie_meta_col`              
    | Yes      | Yes     |                                                      
                                                                                
                                                                                
                                                                         |
-| Promote datatype from `int` to `long` for a field at root level              
    | Yes      | Yes     | For other types, Hudi supports promotion as 
specified in [Avro schema 
resolution](http://avro.apache.org/docs/current/spec#Schema+Resolution).        
                                                                                
                                                        |
-| Promote datatype from `int` to `long` for a nested field                     
    | Yes      | Yes     |
-| Promote datatype from `int` to `long` for a complex type (value of map or 
array) | Yes      | Yes     |                                                   
                                                                                
                                                                                
                                                                            |
-| Add a new non-nullable column at root level at the end                       
    | No       | No      | In case of MOR table with Spark data source, write 
succeeds but read fails. As a **workaround**, you can make the field nullable.  
                                                                                
                                                                           |
-| Add a new non-nullable column to inner struct (at the end)                   
    | No       | No      |                                                      
                                                                                
                                                                                
                                                                         |
-| Change datatype from `long` to `int` for a nested field                      
    | No       | No      |                                                      
                                                                                
                                                                                
                                                                         |
-| Change datatype from `long` to `int` for a complex type (value of map or 
array)  | No       | No      |                                                  
                                                                                
                                                                                
                                                                             |
-
+The incoming schema will automatically have missing columns from the table 
schema added with null value

Review Comment:
   how about the following
   
   The incoming schema will automatically have missing columns added with null 
values from the table schema.
   For this we need to enable the following config
   `hoodie.write.handle.missing.cols.with.lossless.type.promotion`, otherwise 
the pipeline will fail. Note: This particular config will also do best effort 
to solve some of the backward incompatible
   type promotions eg., 'long' to 'int'.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-6958] Simplify Out Of Box Schema Evolution Functionality - DOCS [hudi]

Reply via email to