parisni opened a new issue, #10508:
URL: https://github.com/apache/hudi/issues/10508
The ComplexKeyGenerator does not produce the same result for 0.14 than
previous versions. This leads to duplicate data when upserting.
**To Reproduce**
```python
tableName = 'test_hudi'
basePath = "/tmp/{tableName}".format(tableName=tableName)
df = (
spark.sql("select 1 event_id, 2 event_date, 3 version")
)
hudi_options = {
"hoodie.table.name": tableName,
"hoodie.datasource.write.recordkey.field": "event_id",
"hoodie.datasource.write.partitionpath.field": "event_date",
"hoodie.datasource.write.table.name": tableName,
"hoodie.datasource.write.operation": "upsert",
"hoodie.datasource.write.precombine.field": "version",
"hoodie.datasource.hive_sync.enable": "false",
"hoodie.datasource.write.keygenerator.class":
"org.apache.hudi.keygen.ComplexKeyGenerator",
"hoodie.metadata.enable": "true",
}
(df.write.format("hudi").options(**hudi_options).mode("overwrite").save(basePath))
spark.read.format("hudi").load(basePath).show()
```
0.13.1
```
+-------------------+--------------------+------------------+----------------------+--------------------+--------+-------+----------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
_hoodie_file_name|event_id|version|event_date|
+-------------------+--------------------+------------------+----------------------+--------------------+--------+-------+----------+
| 20240116135635023|20240116135635023...| event_id:1|
2|a1e0e599-c09c-44d...| 1| 3| 2|
+-------------------+--------------------+------------------+----------------------+--------------------+--------+-------+----------+
```
0.14.1
```
+-------------------+--------------------+------------------+----------------------+--------------------+--------+-------+----------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
_hoodie_file_name|event_id|version|event_date|
+-------------------+--------------------+------------------+----------------------+--------------------+--------+-------+----------+
| 20240116135503412|20240116135503412...| 1|
2|ce35287e-af94-48c...| 1| 3| 2|
+-------------------+--------------------+------------------+----------------------+--------------------+--------+-------+----------+
```
**Expected behavior**
The _hoodie_record_key should not change across versions or at least
specified in the migration guide.
**Environment Description**
* Hudi version : 0.14.1
* Spark version : 3.2.2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]