[GitHub] [hudi] ketkidev commented on issue #9674: [SUPPORT]: Data loss with Concurrent operations on Hudi MOR

via GitHub Tue, 12 Sep 2023 00:53:21 -0700


ketkidev commented on issue #9674:
URL: https://github.com/apache/hudi/issues/9674#issuecomment-1715190309


   @ad1happy2go When you are asking for Cleaner config what exactly are you 
looking for?
   We have deployed cleaner utility on EC2 instance and running with below 
command:
   ```
   /usr/local/bin/spark-submit
   --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem 
   --conf 
spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.InstanceProfileCredentialsProvider,com.amazonaws.auth.DefaultAWSCredentialsProviderChain
 
   --conf 
spark.hadoop.fs.AbstractFileSystem.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
   --conf 
spark.jars.packages=org.apache.spark:spark-avro_2.12:3.0.1,org.apache.hadoop:hadoop-aws:3.2.2,com.amazonaws:aws-java-sdk-bundle:1.12.180,org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0
 
   --class org.apache.hudi.utilities.HoodieCleaner 
/home/ubuntu/hudi-utilities-bundle.jar 
   --target-base-path s3a://bucket_name/table_path/reference/ 
   --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS 
   --hoodie-conf hoodie.keep.max.commits=50
   --hoodie-conf hoodie.keep.min.commits=49
   --hoodie-conf hoodie.cleaner.commits.retained=48
   --hoodie-conf hoodie.cleaner.parallelism=400
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] ketkidev commented on issue #9674: [SUPPORT]: Data loss with Concurrent operations on Hudi MOR

Reply via email to