[jira] [Commented] (HUDI-8629) MergeInto w/ Partial updates pulls in fields from source not in assignment clause

Y Ethan Guo (Jira) Tue, 14 Jan 2025 14:09:40 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913083#comment-17913083
 ]


Y Ethan Guo commented on HUDI-8629:
-----------------------------------

The gist is the MERGE INTO statement explicitly sets the "hoodie.write.schema" 
to be the table schema (in MergeIntoHoodieTableCommand), so the source fields 
do not get into the table even if the deduced writer schema in the SQL writer 
contains the source fields.  This means that MIT does not support schema 
evolution now, which is the expectation since the beginning of MIT support in 
Hudi.

> MergeInto w/ Partial updates pulls in fields from source not in assignment 
> clause
> ---------------------------------------------------------------------------------
>
>                 Key: HUDI-8629
>                 URL: https://issues.apache.org/jira/browse/HUDI-8629
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: sivabalan narayanan
>            Assignee: Y Ethan Guo
>            Priority: Blocker
>             Fix For: 1.0.1
>
>         Attachments: image-2024-12-02-04-07-54-483.png
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> TestPartialUpdateForMergeInto.Test partial update with MOR and Avro log 
> format  w/ some slight changes. 
>  
> spark.sql(s"set 
> ${HoodieWriteConfig.MERGE_SMALL_FILE_GROUP_CANDIDATES_LIMIT.key} = 0")
> spark.sql(s"set 
> ${DataSourceWriteOptions.ENABLE_MERGE_INTO_PARTIAL_UPDATES.key} = true")
> spark.sql(s"set ${HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key} = 
> $logDataBlockFormat")
> spark.sql(s"set ${HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key} = false")
> // Create a table with five data fields
> spark.sql(
> s"""
> |create table $tableName (
> | id int,
> | name string,
> | price long,
> | _ts long,
> | description string
> |) using hudi
> |tblproperties(
> | type ='$tableType',
> | primaryKey = 'id',
> | preCombineField = '_ts'
> |)
> |location '$basePath'
> """.stripMargin)
> spark.sql(s"insert into $tableName values (1, 'a1', 10, 1000, 'a1: desc1')," +
> "(2, 'a2', 20, 1200, 'a2: desc2'), (3, 'a3', 30.0, 1250, 'a3: desc3')")
>  
>  
> spark.sql(
> s"""
> |merge into $tableName t0
> |using ( select 1 as id, 'a1' as name, 12 as price, 1001 as ts
> |union select 3 as id, 'a3' as name, 25 as price, 1260 as ts) s0
> |on t0.id = s0.id
> |when matched then update set price = s0.price, _ts = s0.ts
> |""".stripMargin)
>  
>  
> While executing this MergeInto statement, we modify the schema to be as 
> follows. 
> !image-2024-12-02-04-07-54-483.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-8629) MergeInto w/ Partial updates pulls in fields from source not in assignment clause

Reply via email to