ad1happy2go commented on issue #11783:
URL: https://github.com/apache/hudi/issues/11783#issuecomment-2419220387

   @Gatsby-Lee I tried to use metadata table and 
`hoodie.bloom.index.use.metadata` and on emr-7.1.0 and didn't got any issues. 
Can you try below or share the code/configurations what you are using?
   
   Code - 
   ```
   pyspark --jars /usr/lib/hudi/hudi-spark-bundle.jar --conf 
"spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog"
 --conf 
"spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
   ```
   
   ```
   schema = StructType(
       [
           StructField("id", IntegerType(), True),
           StructField("name", StringType(), True)
       ]
   )
   
   data = [
       Row(1, "a"),
       Row(2, "a"),
       Row(3, "c"),
   ]
   
   
   hudi_configs = {
       "hoodie.table.name": TABLE_NAME,
       "hoodie.datasource.write.recordkey.field": "name",
       "hoodie.datasource.write.precombine.field": "id",
       "hoodie.datasource.write.operation":"insert_overwrite_table",
       "hoodie.table.keygenerator.class": 
"org.apache.hudi.keygen.NonpartitionedKeyGenerator",
       "hoodie.index.type" : "BLOOM",
       "hoodie.metadata.index.bloom.filter.enable" : "true",
       "hoodie.bloom.index.use.metadata" : "true"
   }
   
   df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
   
   
df.write.format("org.apache.hudi").options(**hudi_configs).mode("overwrite").save(PATH)
   
   spark.read.format("hudi").load(PATH).show()
   
   for i in range(0,30):
       
df.write.format("org.apache.hudi").options(**hudi_configs).mode("append").save(PATH)
   ```
   
   
   Also, Did you got a chance to try with OSS 0.15.0 version ?
   OSS 0.14.1 doesn't officially support spark 3.5. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to