hseagle opened a new issue #2538:
URL: https://github.com/apache/hudi/issues/2538
In the lastest version 0.7.0, MOR does not work. It means the duplicated
keys hasn't been merged.
Here is the sample application code
```scala
case class Person(firstname:String, age:Int, gender:Int)
val personDF = List(Person("tom",45,1),
Person("iris",44,0)).toDF.withColumn("ts",unix_timestamp).withColumn("insert_time",current_timestamp)
//val personDF2 = List(Person("peng",56,1),
Person("iris",51,0),Person("jacky",25,1)).toDF.withColumn("ts",unix_timestamp).withColumn("insert_time",current_timestamp)
//personDF.write.mode(SaveMode.Overwrite).format("hudi").saveAsTable("employee")
val hudiCommonOptions = Map(
"hoodie.compact.inline" -> "true",
"hoodie.compact.inline.max.delta.commits" ->"1"
)
val tableName = "employee"
val hudiHiveOptions = Map(
DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true",
DataSourceWriteOptions.HIVE_URL_OPT_KEY ->
"jdbc:hive2://localhost:10000",
DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> tableName,
DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "gender",
DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY -> "true",
"hoodie.datasource.write.table.type"->"MERGE_ON_READ",
"hoodie.datasource.hive_sync.support_timestamp"->"true",
"hoodie.datasource.write.operation" -> "upsert",
DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY ->
classOf[MultiPartKeysValueExtractor].getName
)
val basePath = s"/tmp/$tableName"
personDF.write.format("hudi").
options(getQuickstartWriteConfigs).
option(PRECOMBINE_FIELD_OPT_KEY, "ts").
option(RECORDKEY_FIELD_OPT_KEY, "firstname").
option(PARTITIONPATH_FIELD_OPT_KEY, "gender").
option(TABLE_NAME, tableName).
options(hudiCommonOptions).
options(hudiHiveOptions).
mode(SaveMode.Overwrite).
save(basePath)
val personDF2 = List(Person("tom",26,1),
Person("iris",31,0),Person("jacky",35,1)).toDF.withColumn("ts",unix_timestamp).withColumn("insert_time",current_timestamp)
personDF2.write.format("hudi").
options(getQuickstartWriteConfigs).
option(PRECOMBINE_FIELD_OPT_KEY, "ts").
option(RECORDKEY_FIELD_OPT_KEY, "firstname").
option(PARTITIONPATH_FIELD_OPT_KEY, "gender").
option(TABLE_NAME, tableName).
options(hudiCommonOptions).
options(hudiHiveOptions).
mode(SaveMode.Append).
save(basePath)
sql(s"refresh table ${tableName}_rt")
sql(s"select firstname, age, gender, ts, insert_time from
${tableName}_rt").show(20,false)
```
The final query result is showed in the images. We can find that the
duplicated key haven't been merged.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]