ahmad-maruf opened a new issue #9127:
URL: https://github.com/apache/airflow/issues/9127


   A bug in the latest stable version of Airflow (1.10.10) causes the following 
library API call mismatch error when calling the `EmrAddStepsOperator`:
   ```
   [2020-06-03 18:05:06,862] {taskinstance.py:1145} ERROR - 'EMR' object has no 
attribute 'get_cluster_id_by_name'
   Traceback (most recent call last):
     File 
"/home/ubuntu/.pyenv/versions/3.7.7/envs/.venv_python377/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 983, in _run_raw_task
       result = task_copy.execute(context=context)
     File 
"/home/ubuntu/.pyenv/versions/3.7.7/envs/.venv_python377/lib/python3.7/site-packages/airflow/contrib/operators/emr_add_steps_operator.py",
 line 74, in execute
       job_flow_id = emr.get_cluster_id_by_name(self.job_flow_name, 
self.cluster_states)
     File 
"/home/ubuntu/.pyenv/versions/3.7.7/envs/.venv_python377/lib/python3.7/site-packages/botocore/client.py",
 line 575, in _getattr_
       self._class.name_, item)
   AttributeError: 'EMR' object has no attribute 'get_cluster_id_by_name'
   [2020-06-03 18:05:06,864] {taskinstance.py:1202} INFO - Marking task as 
FAILED.dag_id=my_spark_job_dag_id, task_id=my_spark_job_emr_add_step_id, 
execution_date=20200603T180500, start_date=20200603T180506, 
end_date=20200603T180506
   [2020-06-03 18:05:16,153] {logging_mixin.py:112} INFO - [2020-06-03 
18:05:16,153] {local_task_job.py:103} INFO - Task exited with return code 1*
   ```
   After digging through the library API code, I found the code bug here: 
https://github.com/apache/airflow/blob/b099571b9af739c5a96e7aed41be9f22912a3443/airflow/contrib/operators/emr_add_steps_operator.py#L74
   
   The root cause is that `botocore.client.EMR object has no attribute 
'get_cluster_id_by_name'`. Instead this attribute belongs to 
`airflow.contrib.hooks.emr_hook.EmrHook` object. 
   
   Compare the above bug with **corrected** corresponding code in the Airflow 
2.0.0Dev version in the `master` branch:
   
https://github.com/apache/airflow/blob/ff5dcccbbd49e7a4632f93fa915565ac31730110/airflow/providers/amazon/aws/operators/emr_add_steps.py#L77
   
   This is forcing the user to provide `job_flow_id` directly when 
instantiating `EmrAddStepsOperator`, which in my opinion is not the best 
practice.
   
   apache-airflow     1.10.10
   boto                       2.49.0
   boto3                     1.13.18
   botocore                1.16.21
   Python                    3.7.7
   
   If this issue has already been fixed in Airflow 1.10.10 somehow, please 
provide instructions as I'm not aware of it. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to