Re: [I] [BUG] Getting ParquetDecodingException while writing to hudi dataset with MAP columns in hudi 0.13.1 [hudi]

via GitHub Thu, 09 Nov 2023 05:56:24 -0800


ad1happy2go commented on issue #10029:
URL: https://github.com/apache/hudi/issues/10029#issuecomment-1803879007


   @Shubham21k @pushpavanthar Thanks for raising this.  I tried to reproduce 
this issue with following code but it worked fine with 0.13.1. Can you try to 
run this code in your setup to confirm. Or can you suggest me changes to this 
which could help me to reproduce this issue.
   
   ```
   
   spark = get_spark_session(spark_version="3.2", hudi_version="0.13.1")
   
   schema = StructType(
       [
           StructField("id", IntegerType(), True),
           StructField("name", StringType(), True),
           StructField("country", StringType(), True),
           StructField("info", MapType(StringType(), StringType()), True)
       ]
   )
   
   data = [
       Row(1, "John","US", {"age" : "30", "city" : "New York"}),
       Row(2, "Alice","US", {"age" : "25", "city" : "San Francisco"}),
       Row(3, "Bob","Canada", {"age" : "35", "city" : "Toronto"}),
   ]
   hudi_configs = {
       "hoodie.table.name": TABLE_NAME,
       "hoodie.datasource.write.recordkey.field": "id",
       "hoodie.datasource.write.precombine.field": "country",
       "hoodie.table.base.file.format" :"PARQUET",
       "hoodie.table.keygenerator.class": 
"org.apache.hudi.keygen.NonpartitionedKeyGenerator",
   }
   df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
   
   df.write.mode("overwrite").parquet(PATH + "_parquet")
   
   spark.read.parquet(PATH + "_parquet").createOrReplaceTempView("temp_parquet")
   new_df = spark.sql("SELECT * FROM temp_parquet")
   
new_df.write.format("org.apache.hudi").options(**hudi_configs).mode("append").save(PATH)
   
   data = [
       Row(1, "John","US", {"age" : "30", "city" : "San Francisco"})
   ]
   
   df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
   
   df.write.mode("overwrite").parquet(PATH + "_parquet")
   
   spark.read.parquet(PATH + "_parquet").createOrReplaceTempView("temp_parquet")
   new_df = spark.sql("SELECT * FROM temp_parquet")
   new_df.write.format("org.apache.hudi").mode("append").save(PATH)
   
   spark.read.format("hudi").load(PATH).show(20, False)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [BUG] Getting ParquetDecodingException while writing to hudi dataset with MAP columns in hudi 0.13.1 [hudi]

Reply via email to