Taragolis opened a new issue, #24030: URL: https://github.com/apache/airflow/issues/24030
### Apache Airflow Provider(s) amazon ### Versions of Apache Airflow Providers [main](https://github.com/apache/airflow/tree/main/airflow/providers/amazon) branch ### Apache Airflow version main (development) ### Operating System any ### Deployment Other ### Deployment details _No response_ ### What happened I'm investigate amazon-provider and found that different operators/sensors and other components use different approach to do the same things --- ## Operators/Sensors set hook during initialise (`__init__`) At that moment Operators/Sensors uses 4 different approach to get hook: 1. Set in `__init__` - which could cost use additional resources of scheduler/dag-processor 2. Set empty hook in during initialise and set by specific method (usual `get_hook`) 3. Cached property `hook` or similar 4. Define in `execute`/`poke` method I think we should avoid 1 and 2 **List of components**: * `airflow.airflow.providers.amazon.aws.operators.batch.BatchOperator` - set during operator initialise * `airflow.airflow.providers.amazon.aws.operators.datasync.DataSyncOperator` - set None during operator initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.operators.ecs.EcsOperator` - set None during operator initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.operators.rds.RdsBaseOperator` - set during operator initialise * `airflow.airflow.providers.amazon.aws.sensors.batch.BatchSensor` - set None during sensor initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.sensors.dms.DmsTaskBaseSensor` - set None during sensor initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor` - set None during sensor initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.sensors.glue_catalog_partition.GlueCatalogPartitionSensor` - set None during sensor initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.sensors.glue_crawler.GlueCrawlerSensor` - set None during sensor initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.sensors.quicksight.QuickSightSensor` - attributes `quicksight_hook` and `sts_hook` doesn't use * `airflow.airflow.providers.amazon.aws.operators.rds.RdsBaseSensor` - set during sensor initialise * `airflow.airflow.providers.amazon.aws.sensors.redshift_cluster.RedshiftClusterSensor` - set None during sensor initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.sensors.s3.S3KeySensor` - set None during sensor initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.sensors.sagemaker.SageMakerBaseSensor` - set None during sensor initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.sensors.sqs.SqsSensor` - set None during sensor initialise, init hook by `get_hook` * `airflow.airflow.providers.amazon.aws.sensors.step_function.StepFunctionExecutionSensor` - set None during sensor initialise, init hook by `get_hook` --- ## `region` vs `region_name` attribute AWSBaseHook expected `region_name` however some operator/sensors uses `region`. For consistency better rename to `region_name` with mark `region` as deprecated field **List of components**: * `airflow.airflow.providers.amazon.aws.operators.eks.EksCreateClusterOperator` * `airflow.airflow.providers.amazon.aws.operators.eks.EksCreateNodegroupOperator` * `airflow.airflow.providers.amazon.aws.operators.eks.EksCreateFargateProfileOperator` * `airflow.airflow.providers.amazon.aws.operators.eks.EksDeleteClusterOperator` * `airflow.airflow.providers.amazon.aws.operators.eks.EksDeleteNodegroupOperator` * `airflow.airflow.providers.amazon.aws.operators.eks.EksDeleteFargateProfileOperator` * `airflow.airflow.providers.amazon.aws.operators.eks.EksPodOperator` * `airflow.airflow.providers.amazon.aws.operators.redshift_data.RedshiftDataOperator` * `airflow.airflow.providers.amazon.aws.operators.quicksight.QuickSightCreateIngestionOperator` * `airflow.airflow.providers.amazon.aws.sensors.eks.EksClusterStateSensor` * `airflow.airflow.providers.amazon.aws.sensors.eks.EksFargateProfileStateSensor` * `airflow.airflow.providers.amazon.aws.sensors.eks.EksNodegroupStateSensor` --- ## No explicit set `region_name` Some components use region_name from connection, and doesn't have parameter/argument `region_name` Note: At that moment only glacier component, and some S3 operations non-regional, however even for this components better set region_name **List of components**: * `airflow.airflow.providers.amazon.aws.operators.athena.AthenaOperator` * `airflow.airflow.providers.amazon.aws.operators.aws_lambda.AwsLambdaInvokeFunctionOperator` * `airflow.airflow.providers.amazon.aws.operators.athena.CloudFormationCreateStackOperator` * `airflow.airflow.providers.amazon.aws.operators.datasync.DataSyncOperator` * `airflow.airflow.providers.amazon.aws.operators.dms.DmsCreateTaskOperator` * `airflow.airflow.providers.amazon.aws.operators.dms.DmsDescribeTasksOperator` * `airflow.airflow.providers.amazon.aws.operators.dms.DmsStartTaskOperator` * `airflow.airflow.providers.amazon.aws.operators.dms.DmsStopTaskOperator` * `airflow.airflow.providers.amazon.aws.operators.emr.EmrAddStepsOperator` * `airflow.airflow.providers.amazon.aws.operators.emr.EmrContainerOperator` * `airflow.airflow.providers.amazon.aws.operators.emr.EmrModifyClusterOperator` * `airflow.airflow.providers.amazon.aws.operators.emr.EmrTerminateJobFlowOperator` * `airflow.airflow.providers.amazon.aws.operators.glue_crawler.GlueCrawlerOperator` * `airflow.airflow.providers.amazon.aws.operators.rds.RdsBaseOperator` - and all dependencies * `airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftCreateClusterOperator` * `airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftResumeClusterOperator` * `airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftPauseClusterOperator` * `airflow.airflow.providers.amazon.aws.operators.redshift_cluster.RedshiftDeleteClusterOperator` * `airflow.airflow.providers.amazon.aws.operators.redshift_sql.RedshiftSQLOperator` * `airflow.airflow.providers.amazon.aws.operators.s3.S3DeleteBucketOperator` * `airflow.airflow.providers.amazon.aws.operators.sagemaker.SageMakerBaseOperator` - and all dependencies * `airflow.airflow.providers.amazon.aws.operators.sns.SnsPublishOperator` * `airflow.airflow.providers.amazon.aws.operators.sqs.SqsPublishOperator` * `airflow.airflow.providers.amazon.aws.operators.sqs.StepFunctionStartExecutionOperator` * `airflow.airflow.providers.amazon.aws.operators.step_function.StepFunctionStartExecutionOperator` * `airflow.airflow.providers.amazon.aws.sensors.athena.AthenaSensor` * `airflow.airflow.providers.amazon.aws.sensors.cloud_formation.CloudFormationCreateStackSensor` - missing docstring * `airflow.airflow.providers.amazon.aws.sensors.cloud_formation.CloudFormationDeleteStackSensor` - missing docstring * `airflow.airflow.providers.amazon.aws.sensors.dms.DmsTaskBaseSensor` * `airflow.airflow.providers.amazon.aws.sensors.emr.EmrBaseSensor` - and all dependencies * `airflow.airflow.providers.amazon.aws.sensors.glue.GlacierJobOperationSensor` * `airflow.airflow.providers.amazon.aws.sensors.glue_crawler.GlueCrawlerSensor` * `airflow.airflow.providers.amazon.aws.sensors.quicksight.QuickSightSensor` * `airflow.airflow.providers.amazon.aws.sensors.redshift_cluster.RedshiftClusterSensor` * `airflow.airflow.providers.amazon.aws.sensors.sagemaker.SageMakerBaseSensor` - and all dependencies * `airflow.airflow.providers.amazon.aws.sensors.sqs.SqsSensor` * `airflow.airflow.providers.amazon.aws.sensors.sqs.StepFunctionExecutionSensor` * `airflow.airflow.providers.amazon.aws.transfers.dynamodb_to_s3.DynamoDBToS3Operator` * `airflow.airflow.providers.amazon.aws.transfers.hive_to_dynamodb.HiveToDynamoDBOperator` * `airflow.airflow.providers.amazon.aws.transfers.redshift_to_s3.RedshiftToS3Operator` - __redshift_region_name__ ??? ### What you think should happen instead Try to make some generic stuff by the same way It may help to changes/contributions in the futures. ### How to reproduce _No response_ ### Anything else I do not create PR just because in single PR it will affect almost all sensors/operators IMHO, It is better to implement in parts ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
