[
https://issues.apache.org/jira/browse/HUDI-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan reassigned HUDI-8871:
-----------------------------------------
Assignee: Sagar Sumit
> Data issue with hive sync if adding the column in between
> ---------------------------------------------------------
>
> Key: HUDI-8871
> URL: https://issues.apache.org/jira/browse/HUDI-8871
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: meta-sync
> Reporter: Aditya Goenka
> Assignee: Sagar Sumit
> Priority: Blocker
> Labels: hive-sync, schema-evolution
> Fix For: 1.0.2
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> Reproducible Code
>
> ```
> tableName = "trips_table"
> basePath = "s3://<bucket>/results/temporary/trips_table_wmt"
> columns = ["ts","uuid","rider","driver","fare","city"]
> data
> =[(1695159649087,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco"),
> (1695091554788,"e96c4396-3fad-413a-a942-4cb36106d721","rider-C","driver-M",27.70
> ,"san_francisco"),
> (1695046462179,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-D","driver-L",33.90
> ,"san_francisco"),
> (1695516137016,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-F","driver-P",34.15,"sao_paulo"),
> (1695115999911,"c8abbe79-8d89-47ea-b4ce-4d224bae5bfa","rider-J","driver-T",17.85,"chennai")]
> inserts = spark.createDataFrame(data).toDF(*columns)
> hudi_options =
> { 'hoodie.table.name': tableName,
> 'hoodie.datasource.write.partitionpath.field': 'city',
> 'hoodie.datasource.write.operation' : 'upsert',
> 'hoodie.datasource.meta.sync.enable' : 'true', }
> inserts.write.format("hudi"). \
> options(**hudi_options). \
> mode("overwrite"). \
> save(basePath)
> columns2 = ["ts","uuid","rider","newcol","driver","fare","city"]
> data2
> =[(1695159649087,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco",
> "newval")]
> inserts2 = spark.createDataFrame(data2).toDF(*columns2)
> inserts2.write.format("hudi"). \
> options(**hudi_options). \
> mode("append"). \
> save(basePath)
> ```
>
> Output - the values of new_col is wrong in the output and the values have
> shifted due to order of columns.
> {+}-----------------{-}{{-}}{-}{-}{+}----------------{-}{{-}}+{+}{{-}}{-}----------------{-}{{-}}{{-}}{-}------------------{-}{{-}}{+}{+}{{-}}{-}----------------{-}{{-}}{{-}}{-}---------{-}{{-}}{+}{+}{{-}}{-}----------------{-}{{-}}{{-}}{-}---{-}{{-}}{+}{+}{{-}}{-}----{-}{{-}}{{-}}{-}---------{-}{{-}}{+}{+}{{-}}{-}----{-}{{-}}{{-}}{-}-----------{+}
> |_hoodie_commit_time|_hoodie_commit_seqno|
> _hoodie_record_key|_hoodie_partition_path| _hoodie_file_name| ts|
> uuid| rider| driver| fare| newcol| city|
> {+}-----------------{-}{{-}}{-}{-}{+}----------------{-}{{-}}+{+}{{-}}{-}----------------{-}{{-}}{{-}}{-}------------------{-}{{-}}{+}{+}{{-}}{-}----------------{-}{{-}}{{-}}{-}---------{-}{{-}}{+}{+}{{-}}{-}----------------{-}{{-}}{{-}}{-}---{-}{{-}}{+}{+}{{-}}{-}----{-}{{-}}{{-}}{-}---------{-}{{-}}{+}{+}{{-}}{-}----{-}{{-}}{{-}}{-}-----------{+}
> | 20250116074123564|20250116074123564...|20250116074123564...|
> newval|439069ea-0289-4e5...|1695159649087|334e26e9-8355-45c...|rider-A|
> 19.1|san_francisco|driver-K| newval|
> | 20250116074056916|20250116074056916...|20250116074056916...|
> san_francisco|af0b7568-fc25-4a3...|1695046462179|9909a8b1-2d15-4d3...|rider-D|driver-L|
> 33.9| null|san_francisco|
> | 20250116074056916|20250116074056916...|20250116074056916...|
> san_francisco|af0b7568-fc25-4a3...|1695091554788|e96c4396-3fad-413...|rider-C|driver-M|
> 27.7| null|san_francisco|
> | 20250116074056916|20250116074056916...|20250116074056916...|
> san_francisco|af0b7568-fc25-4a3...|1695159649087|334e26e9-8355-45c...|rider-A|driver-K|
> 19.1| null|san_francisco|
> | 20250116074056916|20250116074056916...|20250116074056916...|
> sao_paulo|fc3f2a03-d722-4de...|1695516137016|e3cf430c-889d-401...|rider-F|driver-P|
> 34.15| null| sao_paulo|
> | 20250116074056916|20250116074056916...|20250116074056916...|
> chennai|e2fd2aa4-4d78-433...|1695115999911|c8abbe79-8d89-47e...|rider-J|driver-T|
> 17.85| null| chennai|
> {+}-----------------{-}{{-}}{-}{-}{+}----------------{-}{{-}}+{+}{{-}}{-}----------------{-}{{-}}{{-}}{-}------------------{-}{{-}}{+}{+}{{-}}{-}----------------{-}{{-}}{{-}}{-}---------{-}{{-}}{+}{+}{{-}}{-}----------------{-}{{-}}{{-}}{-}---{-}{{-}}{+}{+}{{-}}{-}----{-}{{-}}{{-}}{-}---------{-}{{-}}{+}{+}{{-}}{-}----{-}{{-}}{{-}}{-}-----------{+}
>
> Note- This fails with 0.X hudi version
--
This message was sent by Atlassian Jira
(v8.20.10#820010)