alexone95 opened a new issue, #8436:
URL: https://github.com/apache/hudi/issues/8436

   Hello, i'm trying to run up the hoodie commit clean process as a step in a 
cluster EMR via spark submit. I am following the instruction in 
[(https://hudi.apache.org/docs/hoodie_cleaner/)], so in this way i got this in 
script argument on EMR:
   
   spark-submit --class "org.apache.hudi.utilities.HoodieCleaner `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` 
--target-base-path "PATH_TO_.hoodie" --hoodie-conf 
hoodie.cleaner.policy=KEEP_LATEST_COMMITS --hoodie-conf 
hoodie.cleaner.commits.retained=10 --hoodie-conf hoodie.cleaner.parallelism=200"
   
   but what a got is the following error: Error: Missing application resource.
   
   **To Reproduce**
   
   To reproduce the problem we add a step on EMR cluster with the previously 
argument 
   
   **Expected behavior**
   
   I expect that in the .hoodie table i will see only the 10 latest commits
   
   **Environment Description**
   
       Hudi version : 0.12.1-amzn-0
       Spark version : 3.3.0
       Hive version : 3.1.3
       Hadoop version : 3.3.3 amz
       Storage (HDFS/S3/GCS..) : S3
       Running on Docker? (yes/no) : no (EMR 6.9.0)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to