ketkidev commented on issue #9674: URL: https://github.com/apache/hudi/issues/9674#issuecomment-1715190309
@ad1happy2go When you are asking for Cleaner config what exactly are you looking for? We have deployed cleaner utility on EC2 instance and running with below command: ``` /usr/local/bin/spark-submit --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.InstanceProfileCredentialsProvider,com.amazonaws.auth.DefaultAWSCredentialsProviderChain --conf spark.hadoop.fs.AbstractFileSystem.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf spark.jars.packages=org.apache.spark:spark-avro_2.12:3.0.1,org.apache.hadoop:hadoop-aws:3.2.2,com.amazonaws:aws-java-sdk-bundle:1.12.180,org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 --class org.apache.hudi.utilities.HoodieCleaner /home/ubuntu/hudi-utilities-bundle.jar --target-base-path s3a://bucket_name/table_path/reference/ --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS --hoodie-conf hoodie.keep.max.commits=50 --hoodie-conf hoodie.keep.min.commits=49 --hoodie-conf hoodie.cleaner.commits.retained=48 --hoodie-conf hoodie.cleaner.parallelism=400 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
