Taragolis commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149934903
> emr_conn_id is used in only EmrCreateJobFlowOperator and I feel it
expecting that emr_conn_id extra filed should contain JSON request body for
run_job_flow API. But keeping the boto3 API request body in DAG looks more
convenient to me than storing it in connection and if I'm passing
job_flow_overrides param in the task in that case then emr_conn_id should not
be mandatory.
If only case case make `emr_conn_id` non mandatory than probably better just
change receiving config
```python
config = {}
if self.emr_conn_id:
emr_conn = self.get_connection(self.emr_conn_id)
config = emr_conn.extra_dejson.copy()
```
Assume that user provide AWS Connection rather than EMR Connection
personally for me it is not good point.
I could pass `aws_conn_id=None` and assume that Airflow will use Instance
Profile / Task Role
> If the user will store aws_access_key_id in emr_conn_id extra then I'm
assuming that intention to use this connection as authentication rather than
using it in building request body.
If user deploy Airflow in AWS Cloud and follow AWS Best Practices he/she
won't use `aws_access_key_id` and most probably use Instance Profile / Task Role
Examples:
- User use `role_arn` in EMR Connection and expected that he/she switch to
this role, however error raise
- User use `region_name` in EMR Connection and expected that region
changed, however error raise
For me it is not clear why we only need to check `aws_access_key_id` ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]