gentrit1 commented on issue #11560: URL: https://github.com/apache/hudi/issues/11560#issuecomment-2208488286
> Hey there can you please provide sample dataset using faker so we can try locally to re produce this behavior ? Is it okay for you to use the fake data from the [parquet file ](https://we.tl/t-eBT7GTZabv) while having the table options specified as below: hudi_options = { "hoodie.table.name": table_name, "hoodie.datasource.write.table.type": "MERGE_ON_READ", "hoodie.datasource.write.recordkey.field": "UniqueNumber", # key "hoodie.datasource.write.partitionpath.field": "Date,Job", "hoodie.datasource.write.precombine.field": "Timestamp", "hoodie.datasource.write.table.name": table_name, "hoodie.datasource.write.operation": "upsert", "hoodie.combine.before.insert": "true", "hoodie.cleaner.commits.retained": "3", "hoodie.compact.inline.max.delta.commits": "2", "hoodie.enable.data.skipping": "true", "hoodie.metadata.enable": "true", "hoodie.metadata.index.column.stats.enable": "true", "hoodie.metadata.record.index.enable": "true", "hoodie.index.type": "RECORD_INDEX", "hoodie.clustering.inline": "true", "hoodie.clustering.inline.max.commits": "1", "hoodie.clustering.plan.strategy.class": "org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy", "hoodie.clustering.plan.strategy.target.file.max.bytes": "40000000", "hoodie.clustering.plan.strategy.sort.columns": "UniqueNumber", } -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
