mmenarguezpear opened a new issue #16418: URL: https://github.com/apache/airflow/issues/16418
**Apache Airflow version**: 2.1.0 **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): NA **Environment**: bare metal k8s in AWS EC2 - **Cloud provider or hardware configuration**: AWS - **OS** (e.g. from /etc/os-release): - ``` - cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 10 (buster)" NAME="Debian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" ``` - **Kernel** (e.g. `uname -a`): Linux airflow-web-749866f579-ns9rk 5.4.0-1048-aws #50-Ubuntu SMP Mon May 3 21:44:17 UTC 2021 x86_64 GNU/Linux - **Install tools**: pip, docker - **Others**: **What happened**: Upon providing valid arguments, the following error appeared: ``` [2021-06-12 16:31:46,277] {base_aws.py:395} INFO - Creating session using boto3 credential strategy region_name=None [2021-06-12 16:31:47,339] {taskinstance.py:1481} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1137, in _run_raw_task self._prepare_and_execute_task_with_callbacks(context, task) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks result = self._execute_task(context, task_copy) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task result = task_copy.execute(context=context) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 106, in execute s3_hook.load_file(self.script_location, self.s3_bucket, self.s3_artifacts_prefix + script_name) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 62, in wrapper return func(*bound_args.args, **bound_args.kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 91, in wrapper return func(*bound_args.args, **bound_args.kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 499, in load_file if not replace and self.check_for_key(key, bucket_name): File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 62, in wrapper return func(*bound_args.args, **bound_args.kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 91, in wrapper return func(*bound_args.args, **bound_args.kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 323, in check_for_key self.get_conn().head_object(Bucket=bucket_name, Key=key) File "/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call return self._make_api_call(operation_name, kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 648, in _make_api_call request_dict = self._convert_to_request_dict( File "/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 694, in _convert_to_request_dict api_params = self._emit_api_params( File "/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 723, in _emit_api_params self.meta.events.emit( File "/home/airflow/.local/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit return self._emitter.emit(aliased_event_name, **kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit return self._emit(event_name, kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit response = handler(**kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/botocore/handlers.py", line 236, in validate_bucket_name raise ParamValidationError(report=error_msg) botocore.exceptions.ParamValidationError: Parameter validation failed: Invalid bucket name "artifacts/glue-scripts/example.py": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$" [2021-06-12 16:31:47,341] {taskinstance.py:1524} INFO - Marking task as UP_FOR_RETRY. dag_id=glue-example, task_id=example_glue_job_operator, execution_date=20210612T163143, start_date=20210612T163145, end_date=20210612T163147 [2021-06-12 16:31:47,386] {local_task_job.py:151} INFO - Task exited with return code 1 ``` Upon looking at the order of arguments, seems like the 2nd and 3rd are reversed. Furthermore, the operator does not expose the replace option if desired, which is vary valuable. Note key and bucket name are passed by position and not reference https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/operators/glue.py#L104 and they are reversed https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/hooks/s3.py#L466 **What you expected to happen**: To succeed uploading the script. To be able to replace existing script in s3 **How to reproduce it**: Try to upload the file to any S3 buckets ``` t2 = AwsGlueJobOperator( task_id="example_glue_job_operator", job_desc="Example Airflow Glue job", # Note the operator will upload the script if it is not an s3:// reference # See https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/operators/glue.py#L101 script_location="/opt/airflow/dags_lib/example.py", concurrent_run_limit=1, script_args={}, num_of_dpus=1, # This parameter is deprecated (from boto3). Use MaxCapacity instead on kwargs. aws_conn_id='aws_default', region_name="aws-region", s3_bucket="bucket-name", iam_role_name="iam_role_name_here", create_job_kwargs={} ) ``` **Anything else we need to know**: **How often does this problem occur?** Every time id using local script I can take a stub at fixing it. I did notice the operator does not allow to update a glue job definition after its creation. boto3 offers an api to do so but it is not exposed in this operator https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.update_job It would be great if I could add that as well, but might fall out of scope -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org