[jira] [Created] (HUDI-8899) Hudi 1.0 backward writer is not able to turn off MDT

Shawn Chang (Jira) Wed, 22 Jan 2025 17:21:05 -0800

Shawn Chang created HUDI-8899:
---------------------------------

             Summary: Hudi 1.0 backward writer is not able to turn off MDT
                 Key: HUDI-8899
                 URL: https://issues.apache.org/jira/browse/HUDI-8899
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Shawn Chang



Hudi 1.0's backward writer cannot turn MDT off on a Hudi 0.14 table that has 
MDT enabled using `.option("hoodie.metadata.enable", "false")`

 

Reproduction steps:
 # Create the table with 0.14.0

{code:java}
import org.apache.hudi.DataSourceWriteOptions
import org.apache.spark.sql.SaveMode
val df1 = Seq(
 (100, "2015-01-01", "event_name_900", "2015-01-01T13:51:39.340396Z", "type1"),
 (101, "2015-01-01", "event_name_546", "2015-01-01T12:14:58.597216Z", "type2"),
 (102, "2015-01-01", "event_name_345", "2015-01-01T13:51:40.417052Z", "type3"),
 (103, "2015-01-01", "event_name_234", "2015-01-01T13:51:40.519832Z", "type4"),
 (104, "2015-01-01", "event_name_123", "2015-01-01T12:15:00.512679Z", "type1"),
 (105, "2015-01-01", "event_name_678", "2015-01-01T13:51:42.248818Z", "type2"),
 (106, "2015-01-01", "event_name_890", "2015-01-01T13:51:44.735360Z", "type3"),
 (107, "2015-01-01", "event_name_944", "2015-01-01T13:51:45.019544Z", "type4"),
 (108, "2015-01-01", "event_name_456", "2015-01-01T13:51:45.208007Z", "type1"),
 (109, "2015-01-01", "event_name_567", "2015-01-01T13:51:45.369689Z", "type2"),
 (110, "2015-01-01", "event_name_789", "2015-01-01T12:15:05.664947Z", "type3"),
 (111, "2015-01-01", "event_name_322", "2015-01-01T13:51:47.388239Z", "type4")
 ).toDF("event_id", "event_date", "event_name", "event_ts", "event_type")
val r = scala.util.Random
val num =  r.nextInt(99999)
var tableName = "yxchang_hudi_cow_simple_14_" + num
var tablePath = "s3://<bucket>" + tableName + "/"
df1.write.format("hudi")
 .option("hoodie.metadata.enable", "true")
 .option("hoodie.table.name", tableName)
 .option("hoodie.datasource.write.operation", "insert") // use insert
 .option("hoodie.datasource.write.table.type", "COPY_ON_WRITE")
 .option("hoodie.datasource.write.recordkey.field", "event_id,event_date")
 .option("hoodie.datasource.write.partitionpath.field", "event_type") 
 .option("hoodie.datasource.write.precombine.field", "event_ts")
 .option("hoodie.datasource.write.keygenerator.class", 
"org.apache.hudi.keygen.ComplexKeyGenerator")
 .option("hoodie.datasource.hive_sync.enable", "true")
 .option("hoodie.datasource.meta.sync.enable", "true")
 .option("hoodie.datasource.hive_sync.mode", "hms")
 .option("hoodie.datasource.hive_sync.database", "yxchang_nolf")
 .option("hoodie.datasource.hive_sync.table", tableName)
 .option("hoodie.datasource.hive_sync.partition_fields", "event_type")
 .option("hoodie.datasource.hive_sync.partition_extractor_class", 
"org.apache.hudi.hive.MultiPartKeysValueExtractor")
 .mode(SaveMode.Append)
 .save(tablePath) {code}
2. Use Hudi 1.0 backward writer + Spark 3.5 to append data to this table and 
set .option("hoodie.metadata.enable", "false")
{code:java}
val appendDf = Seq(
 (142, "2015-01-02", "event_name_922", "2015-01-01T13:51:39.340396Z", "type1"),
 (143, "2015-01-03", "event_name_533", "2015-01-01T12:14:58.597216Z", "type2"),
 (124, "2015-01-04", "event_name_344", "2015-01-01T13:51:40.417052Z", "type3"),
 (125, "2015-01-05", "event_name_266", "2015-01-01T13:51:40.519832Z", "type4"),
 (126, "2015-01-06", "event_name_177", "2015-01-01T12:15:00.512679Z", "type1"),
 (127, "2015-01-07", "event_name_688", "2015-01-01T13:51:42.248818Z", "type2"),
 (128, "2015-01-08", "event_name_891", "2015-01-01T13:51:44.735360Z", "type3"),
 (129, "2015-01-09", "event_name_945", "2015-01-01T13:51:45.019544Z", "type4"),
 (120, "2015-01-10", "event_name_450", "2015-01-01T13:51:45.208007Z", "type1"),
 (131, "2015-01-11", "event_name_562", "2015-01-01T13:51:45.369689Z", "type2"),
 (132, "2015-01-12", "event_name_786", "2015-01-01T12:15:05.664947Z", "type3"),
 (133, "2015-01-13", "event_name_328", "2015-01-01T13:51:47.388239Z", "type4")
 ).toDF("event_id", "event_date", "event_name", "event_ts", "event_type")
 
 appendDf.write.format("hudi")
.option("hoodie.metadata.enable", "false")
.option("hoodie.table.name", tableName)
.option("hoodie.table.version", 6)
.option("hoodie.write.table.version", 6)
.option("hoodie.table.initial.version", 6)
.option("hoodie.datasource.write.operation", "insert") // use insert
.option("hoodie.datasource.write.table.type", "COPY_ON_WRITE")
//  .option("hoodie.datasource.write.table.type", "MERGE_ON_READ")
//  .option("hoodie.compact.inline", inlineCompaction)
//  .option("hoodie.compact.inline.max.delta.commits", 1)
.option("hoodie.datasource.write.recordkey.field", "event_id,event_date")
.option("hoodie.datasource.write.partitionpath.field", "event_type") 
.option("hoodie.datasource.write.precombine.field", "event_ts")
.option("hoodie.datasource.write.keygenerator.class", 
"org.apache.hudi.keygen.ComplexKeyGenerator")
.option("hoodie.datasource.hive_sync.enable", "true")
.option("hoodie.datasource.meta.sync.enable", "true")
.option("hoodie.datasource.hive_sync.mode", "hms")
.option("hoodie.datasource.hive_sync.database", "yxchang_nolf")
.option("hoodie.datasource.hive_sync.table", tableName)
.option("hoodie.datasource.hive_sync.partition_fields", "event_type")
.option("hoodie.datasource.hive_sync.partition_extractor_class", 
"org.apache.hudi.hive.MultiPartKeysValueExtractor")
.mode(SaveMode.Append)
.save(tablePath) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-8899) Hudi 1.0 backward writer is not able to turn off MDT

Reply via email to