[jira] [Comment Edited] (HUDI-8629) MergeInto w/ Partial updates pulls in fields from source not in assignment clause

Y Ethan Guo (Jira) Tue, 14 Jan 2025 12:25:06 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913063#comment-17913063
 ]


Y Ethan Guo edited comment on HUDI-8629 at 1/14/25 8:24 PM:
------------------------------------------------------------

Similar for MERGE INTO statement with partial updates, the source field names 
have to be in the writer schema for resolving update actions containing both 
source and target table fields.  However, this does not cause the table schema 
to change, because the "hoodie.write.schema" used by the write client does not 
contain the source fields, so there is no issue here.


was (Author: JIRAUSER280684):
Similar for MERGE INTO statement with partial updates, the source field names 
have to be in the writer schema for resolving update actions containing both 
source and target table fields.  However, this does not cause the table schema 
to change, so there is no issue here.

> MergeInto w/ Partial updates pulls in fields from source not in assignment 
> clause
> ---------------------------------------------------------------------------------
>
>                 Key: HUDI-8629
>                 URL: https://issues.apache.org/jira/browse/HUDI-8629
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: sivabalan narayanan
>            Assignee: Y Ethan Guo
>            Priority: Blocker
>             Fix For: 1.0.1
>
>         Attachments: image-2024-12-02-04-07-54-483.png
>
>
> TestPartialUpdateForMergeInto.Test partial update with MOR and Avro log 
> format  w/ some slight changes. 
>  
> spark.sql(s"set 
> ${HoodieWriteConfig.MERGE_SMALL_FILE_GROUP_CANDIDATES_LIMIT.key} = 0")
> spark.sql(s"set 
> ${DataSourceWriteOptions.ENABLE_MERGE_INTO_PARTIAL_UPDATES.key} = true")
> spark.sql(s"set ${HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key} = 
> $logDataBlockFormat")
> spark.sql(s"set ${HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key} = false")
> // Create a table with five data fields
> spark.sql(
> s"""
> |create table $tableName (
> | id int,
> | name string,
> | price long,
> | _ts long,
> | description string
> |) using hudi
> |tblproperties(
> | type ='$tableType',
> | primaryKey = 'id',
> | preCombineField = '_ts'
> |)
> |location '$basePath'
> """.stripMargin)
> spark.sql(s"insert into $tableName values (1, 'a1', 10, 1000, 'a1: desc1')," +
> "(2, 'a2', 20, 1200, 'a2: desc2'), (3, 'a3', 30.0, 1250, 'a3: desc3')")
>  
>  
> spark.sql(
> s"""
> |merge into $tableName t0
> |using ( select 1 as id, 'a1' as name, 12 as price, 1001 as ts
> |union select 3 as id, 'a3' as name, 25 as price, 1260 as ts) s0
> |on t0.id = s0.id
> |when matched then update set price = s0.price, _ts = s0.ts
> |""".stripMargin)
>  
>  
> While executing this MergeInto statement, we modify the schema to be as 
> follows. 
> !image-2024-12-02-04-07-54-483.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HUDI-8629) MergeInto w/ Partial updates pulls in fields from source not in assignment clause

Reply via email to