ad1happy2go commented on issue #10029:
URL: https://github.com/apache/hudi/issues/10029#issuecomment-1803879007

   @Shubham21k @pushpavanthar Thanks for raising this.  I tried to reproduce 
this issue with following code but it worked fine with 0.13.1. Can you try to 
run this code in your setup to confirm. Or can you suggest me changes to this 
which could help me to reproduce this issue.
   
   ```
   
   spark = get_spark_session(spark_version="3.2", hudi_version="0.13.1")
   
   schema = StructType(
       [
           StructField("id", IntegerType(), True),
           StructField("name", StringType(), True),
           StructField("country", StringType(), True),
           StructField("info", MapType(StringType(), StringType()), True)
       ]
   )
   
   data = [
       Row(1, "John","US", {"age" : "30", "city" : "New York"}),
       Row(2, "Alice","US", {"age" : "25", "city" : "San Francisco"}),
       Row(3, "Bob","Canada", {"age" : "35", "city" : "Toronto"}),
   ]
   hudi_configs = {
       "hoodie.table.name": TABLE_NAME,
       "hoodie.datasource.write.recordkey.field": "id",
       "hoodie.datasource.write.precombine.field": "country",
       "hoodie.table.base.file.format" :"PARQUET",
       "hoodie.table.keygenerator.class": 
"org.apache.hudi.keygen.NonpartitionedKeyGenerator",
   }
   df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
   
   df.write.mode("overwrite").parquet(PATH + "_parquet")
   
   spark.read.parquet(PATH + "_parquet").createOrReplaceTempView("temp_parquet")
   new_df = spark.sql("SELECT * FROM temp_parquet")
   
new_df.write.format("org.apache.hudi").options(**hudi_configs).mode("append").save(PATH)
   
   data = [
       Row(1, "John","US", {"age" : "30", "city" : "San Francisco"})
   ]
   
   df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
   
   df.write.mode("overwrite").parquet(PATH + "_parquet")
   
   spark.read.parquet(PATH + "_parquet").createOrReplaceTempView("temp_parquet")
   new_df = spark.sql("SELECT * FROM temp_parquet")
   new_df.write.format("org.apache.hudi").mode("append").save(PATH)
   
   spark.read.format("hudi").load(PATH).show(20, False)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to