gentrit1 commented on issue #11560:
URL: https://github.com/apache/hudi/issues/11560#issuecomment-2208488286

   > Hey there can you please provide sample dataset using faker so we can try 
locally to re produce this behavior ?
   
   Is it okay for you to use the fake data from the [parquet file 
](https://we.tl/t-eBT7GTZabv) while having the table options specified as below:
   
   hudi_options = {
       "hoodie.table.name": table_name,
       "hoodie.datasource.write.table.type": "MERGE_ON_READ",
       "hoodie.datasource.write.recordkey.field": "UniqueNumber",  # key
       "hoodie.datasource.write.partitionpath.field": "Date,Job",
       "hoodie.datasource.write.precombine.field": "Timestamp",
       "hoodie.datasource.write.table.name": table_name,
       "hoodie.datasource.write.operation": "upsert",
       "hoodie.combine.before.insert": "true",
       "hoodie.cleaner.commits.retained": "3",
       "hoodie.compact.inline.max.delta.commits": "2",
       "hoodie.enable.data.skipping": "true",
       "hoodie.metadata.enable": "true",
       "hoodie.metadata.index.column.stats.enable": "true",
       "hoodie.metadata.record.index.enable": "true",
       "hoodie.index.type": "RECORD_INDEX",
       "hoodie.clustering.inline": "true",
       "hoodie.clustering.inline.max.commits": "1",
       "hoodie.clustering.plan.strategy.class": 
"org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy",
       "hoodie.clustering.plan.strategy.target.file.max.bytes": "40000000",
       "hoodie.clustering.plan.strategy.sort.columns": "UniqueNumber",
   }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to