rajgowtham24 opened a new issue #2075:
URL: https://github.com/apache/hudi/issues/2075
Hi team,
In one of our tables, we have Version as a Pre combine field and in the
write option we have used the same and found that it's not working as expected.
Whereas in some of the tables we have changed on column as pre-combine field
with timestamp as data and it's working fine for those tables
When we have the precombine key as Numbers in tables, If the numbers is less
than 10, it's working as expected, if it's more than that it's inserting the
Version 9 into table and not considering the values greater than 9.
And i'm not sure it's a bug or i'm using incorrect option while writing.
sample dataframe value
+-------+-------+----------+
| NAME|VERSION|CHANGED_BY|
+-------+-------+----------+
|T009S50| 3| USER001|
|T009S50| 2| USER002|
|T009S50| 1| USER002|
|T009S50| 5| USER001|
|T009S50| 4| USER001|
|T009S50| 6| USER002|
|T009S50| 7| USER002|
|T009S50| 8| USER001|
|T009S50| 9| USER001|
|T009S50| 10| USER003|
+-------+-------+----------+
Write Options used
input_df.write.format("org.apache.hudi").option("hoodie.datasource.write.recordkey.field",
"NAME).option("hoodie.datasource.write.precombine.field","VERSION").option("hoodie.table.name","TABLE1").option("hoodie.datasource.write.storage.type","MERGE_ON_READ").option("hoodie.datasource.hive_sync.enable","true").option("hoodie.datasource.hive_sync.table","TABLE1").option("hoodie.datasource.hive_sync.assume_date_partitioning","false").option("hoodie.datasource.hive_sync.partition_extractor_class","org.apache.hudi.hive.NonPartitionedExtractor").mode("Overwrite").save(s3:\\mybucket\tablepath\)
Inserted Record From Hive
T009S50 9 USER001
I have tested the above scenario for few of the datasets
Environment Details
emr-6.0.0
Hudi Version - 0.5.0
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]