Amar1404 commented on issue #10309:
URL: https://github.com/apache/hudi/issues/10309#issuecomment-1859549331
Hi @ad1happy2go - Please find below the configurations
"hoodie.schema.on.read.enable": "true"
"hoodie.cleaner.commits.retained": "3",
"hoodie.datasource.write.reconcile.schema": "true",
"hoodie.parquet.compression.codec": "zstd",
"hoodie.delete.shuffle.parallelism": "200",
"hoodie.parquet.max.file.size": "268435456",
"hoodie.upsert.shuffle.parallelism": "200",
"hoodie.datasource.hive_sync.support_timestamp": "true",
"hoodie.datasource.write.keygenerator.class":
"org.apache.hudi.keygen.CustomKeyGenerator",
"hoodie.datasource.write.hive_style_partitioning": "true",
"hoodie.insert.shuffle.parallelism": "200",
"hoodie.parquet.small.file.limit": "134217728",
"hoodie.bootstrap.parallelism": "200",
"hoodie.embed.timeline.server": "true",
"hoodie.bulkinsert.shuffle.parallelism": "200",
"hoodie.datasource.hive_sync.enable": "true",
"hoodie.filesystem.view.type": "EMBEDDED_KV_STORE",
"hoodie.clean.max.commits": "4"
hoodie.metadata.enable: true
spark.hadoop.fs.s3.canned.acl: BucketOwnerFullControl
hoodie.datasource.hive_sync.support_timestamp=true
This is happening when the file is present in another partition or the
parquet file is different here in your case since the data is less that there
will be one parquet file.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]