romibuzi commented on issue #29423: URL: https://github.com/apache/airflow/issues/29423#issuecomment-1422427027
Hi @vgutkovsk! Oh damn indeed I realize introduced a breaking change. Before the check `if self.s3_bucket is None` was done only when the operator was creating the job. Now it is done at the start of `create_glue_job_config()` method here: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L103-L104 And this method is called in any cases here: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L328 I realize `s3_bucket` is only used to determine `s3_log_path`: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L112 `script_location` on the other hand can be None and is not concatenated with `s3_bucket` at all. Maybe the best way to handle the problem would be to remove this check on s3_bucket, and if it is None then omit the the parameter `"LogUri"` which makes usage of `s3_log_path` as it is not a mandatory parameter for a glue job: https://github.com/apache/airflow/blob/44024564cb3dd6835b0375d61e682efc1acd7d2c/airflow/providers/amazon/aws/hooks/glue.py#L118 cc @Taragolis -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
