Ritika-Singhal opened a new issue #19207:
URL: https://github.com/apache/airflow/issues/19207


   ### Apache Airflow version
   
   2.0.2
   
   ### Operating System
   
   Linux (AWS MWAA)
   
   ### Versions of Apache Airflow Providers
   
   Version 2.0.2
   
   ### Deployment
   
   MWAA
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   While running an Airflow step using the AWSGlueJobOperator to run a glue 
script using Airflow, these were the job args given: 
   ```
   "GlueVersion": "3.0",
   "WorkerType": "G.2X",
   "NumberOfWorkers": 60,
   ```
   
   At this point, this is the error encountered:
   
   ```
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
1138, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File 
"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
1311, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File 
"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
1341, in _execute_task
       result = task_copy.execute(context=context)
     File 
"/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/glue.py",
 line 121, in execute
       glue_job_run = glue_job.initialize_job(self.script_args)
     File 
"/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue.py",
 line 108, in initialize_job
       job_name = self.get_or_create_glue_job()
     File 
"/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue.py",
 line 186, in get_or_create_glue_job
       **self.create_job_kwargs,
     File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 
357, in _api_call
       return self._make_api_call(operation_name, kwargs)
     File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 
676, in _make_api_call
       raise error_class(parsed_response, operation_name)
   botocore.errorfactory.InvalidInputException: An error occurred 
(InvalidInputException) when calling the CreateJob operation: Please do not set 
Allocated Capacity if using Worker Type and Number of Workers.
   
   ```
   
   This issue is because AWSGlueJobHook (when called by AWSGlueJobOperator) 
assigns num_of_dpus (defaulted to 6 by the init method) to the 
AllocatedCapacity variable as shown in the screenshot below (taken from the 
AWSGlueJobHook class)
   
![image](https://user-images.githubusercontent.com/36181425/138749154-d0c715c6-5f8c-426d-ae56-0aae684b04f3.png)
   
   The links to the source code are: 
   
https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_modules/airflow/providers/amazon/aws/hooks/glue.html#AwsGlueJobHook.initialize_job
   
https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_modules/airflow/providers/amazon/aws/operators/glue.html#AwsGlueJobOperator.template_fields
   
   So, there is no way for the user to proceed forward by specifying the 
'WorkerType' and 'NumberOfWorkers' and not encountering the error above. 
   
   This is because AWS Glue API does not allow to use "AllocatedCapacity" or 
"MaxCapacity" parameters when the 'WorkerType' and 'NumberOfWorkers' are being 
assigned. Here is the link to the AWS documentation for the same: 
https://docs.aws.amazon.com/en_us/glue/latest/dg/aws-glue-api-jobs-job.html
   
   ### What you expected to happen
   
   The expected outcome is that Airflow runs the Glue job by taking 
"WorkerType" and "NumberOfWorkers" as the parameter for Glue version 3.0.
   
   ### How to reproduce
   
   This issue can be reproduced by the following steps:
   
   1. Set the job args dict to include the following keys and values.
   
   `  "GlueVersion": "3.0",
       "WorkerType": "G.2X",
       "NumberOfWorkers": 60,`
   
   2. Create a dag with one step using AwsGlueJobOperator and assign the 
job_args dict to the `create_job_kwargs` parameter.
   3. Run the dag and this issue will be encoutered.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to