soumilshah1995 opened a new issue, #8400:
URL: https://github.com/apache/hudi/issues/8400

   Subject : Need Help on Compaction Offline for MOR tables 
   
   Good Afternoon and hope you are fine I would want some assistance for next 
content I am creating on hudi offline compaction for 
   
   MOR tables 
   After searching and reading I would seek some guidance on how to submit 
offline compaction and if I am missing anything 
   Attaching sample code 
   
   ```
   try:
       import json
       import uuid
       import os
       import boto3
       from dotenv import load_dotenv
   
       load_dotenv(".env")
   except Exception as e:
       pass
   
   global AWS_ACCESS_KEY
   global AWS_SECRET_KEY
   global AWS_REGION_NAME
   
   AWS_ACCESS_KEY = os.getenv("DEV_ACCESS_KEY")
   AWS_SECRET_KEY = os.getenv("DEV_SECRET_KEY")
   AWS_REGION_NAME = "us-east-1"
   
   client = boto3.client("emr-serverless",
                         aws_access_key_id=AWS_ACCESS_KEY,
                         aws_secret_access_key=AWS_SECRET_KEY,
                         region_name=AWS_REGION_NAME)
   
   
   def lambda_handler_test_emr(event, context):
       # ------------------Hudi settings 
---------------------------------------------
       glue_db = "hudi_db"
       table_name = "invoice"
       path = "s3://delta-streamer-demo-hudi/hudi/"
   
       # 
---------------------------------------------------------------------------------
       #                                       EMR
       # 
--------------------------------------------------------------------------------
       ApplicationId = os.getenv("ApplicationId")
       ExecutionTime = 600
       ExecutionArn = os.getenv("ExecutionArn")
       JobName = 'delta_streamer_{}'.format(table_name)
   
       # 
--------------------------------------------------------------------------------
       spark_submit_parameters = ' --conf 
spark.jars=/usr/lib/hudi/hudi-utilities-bundle.jar'
       spark_submit_parameters += ' --class 
org.apache.hudi.utilities.HoodieCompactor 
/usr/lib/hudi/hudi-utilities-bundle.jar'
   
       arguments = [
           '--spark-memory', '1g',
           '--parallelism', '2',
           "--mode", "schedule",
           "--base-path", path,
           "--table-name", table_name
       ]
   
       response = client.start_job_run(
           applicationId=ApplicationId,
           clientToken=uuid.uuid4().__str__(),
           executionRoleArn=ExecutionArn,
           jobDriver={
               'sparkSubmit': {
                   'entryPoint': "command-runner.jar",
                   'entryPointArguments': arguments,
                   'sparkSubmitParameters': spark_submit_parameters
               },
           },
           executionTimeoutMinutes=ExecutionTime,
           name=JobName,
       )
       print("response", end="\n")
       print(response)
   
   
   lambda_handler_test_emr(context=None, event=None)
   
   ```
   
![image](https://user-images.githubusercontent.com/39345855/230452917-0cb4019d-4334-406a-bed9-1e9c5efca505.png)
   
   
   # Error
   ```
   Job failed, please check complete logs in configured logging destination. 
ExitCode: 1. Last few exceptions: Exception in thread "main" 
org.apache.hudi.com.beust.jcommander.ParameterException: Was passed main 
parameter 'command-runner.jar' but no main parameter was defined in your arg
   ```
   
   looking fwd for your guidance 
   
   References 
   https://hudi.apache.org/docs/compaction/
   https://github.com/apache/hudi/issues/6903
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to