mmenarguezpear opened a new issue #16418:
URL: https://github.com/apache/airflow/issues/16418


   **Apache Airflow version**: 2.1.0
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl 
version`): NA
   
   **Environment**: bare metal k8s in AWS EC2
   
   - **Cloud provider or hardware configuration**: AWS
   - **OS** (e.g. from /etc/os-release):
   - ```
   - cat /etc/os-release 
   PRETTY_NAME="Debian GNU/Linux 10 (buster)"
   NAME="Debian GNU/Linux"
   VERSION_ID="10"
   VERSION="10 (buster)"
   VERSION_CODENAME=buster
   ID=debian
   HOME_URL="https://www.debian.org/";
   SUPPORT_URL="https://www.debian.org/support";
   BUG_REPORT_URL="https://bugs.debian.org/";
   ```
   - **Kernel** (e.g. `uname -a`): Linux airflow-web-749866f579-ns9rk 
5.4.0-1048-aws #50-Ubuntu SMP Mon May 3 21:44:17 UTC 2021 x86_64 GNU/Linux
   - **Install tools**: pip, docker
   - **Others**:
   
   **What happened**:
   Upon providing valid arguments, the following error appeared:
   ```
   
   [2021-06-12 16:31:46,277] {base_aws.py:395} INFO - Creating session using 
boto3 credential strategy region_name=None
   [2021-06-12 16:31:47,339] {taskinstance.py:1481} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 1137, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 1311, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 1341, in _execute_task
       result = task_copy.execute(context=context)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/operators/glue.py",
 line 106, in execute
       s3_hook.load_file(self.script_location, self.s3_bucket, 
self.s3_artifacts_prefix + script_name)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 62, in wrapper
       return func(*bound_args.args, **bound_args.kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 91, in wrapper
       return func(*bound_args.args, **bound_args.kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 499, in load_file
       if not replace and self.check_for_key(key, bucket_name):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 62, in wrapper
       return func(*bound_args.args, **bound_args.kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 91, in wrapper
       return func(*bound_args.args, **bound_args.kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py",
 line 323, in check_for_key
       self.get_conn().head_object(Bucket=bucket_name, Key=key)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 
357, in _api_call
       return self._make_api_call(operation_name, kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 
648, in _make_api_call
       request_dict = self._convert_to_request_dict(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 
694, in _convert_to_request_dict
       api_params = self._emit_api_params(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/botocore/client.py", line 
723, in _emit_api_params
       self.meta.events.emit(
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/hooks.py", 
line 356, in emit
       return self._emitter.emit(aliased_event_name, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/hooks.py", 
line 228, in emit
       return self._emit(event_name, kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/botocore/hooks.py", 
line 211, in _emit
       response = handler(**kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/botocore/handlers.py", line 
236, in validate_bucket_name
       raise ParamValidationError(report=error_msg)
   botocore.exceptions.ParamValidationError: Parameter validation failed:
   Invalid bucket name "artifacts/glue-scripts/example.py": Bucket name must 
match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex 
"^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
   [2021-06-12 16:31:47,341] {taskinstance.py:1524} INFO - Marking task as 
UP_FOR_RETRY. dag_id=glue-example, task_id=example_glue_job_operator, 
execution_date=20210612T163143, start_date=20210612T163145, 
end_date=20210612T163147
   [2021-06-12 16:31:47,386] {local_task_job.py:151} INFO - Task exited with 
return code 1
   ```
   Upon looking at the order of arguments, seems like the 2nd and 3rd are 
reversed. Furthermore, the operator does not expose the replace option if 
desired, which is vary valuable.
   Note key and bucket name are passed by position and not reference 
https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/operators/glue.py#L104
   and they are reversed 
https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/hooks/s3.py#L466
   
   **What you expected to happen**: To succeed uploading the script. To be able 
to replace existing script in s3
   
   **How to reproduce it**:
   Try to upload the file to any S3 buckets
   ```
    t2 = AwsGlueJobOperator(
           task_id="example_glue_job_operator",
           job_desc="Example Airflow Glue job",
           # Note the operator will upload the script if it is not an s3:// 
reference
           # See 
https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/operators/glue.py#L101
           script_location="/opt/airflow/dags_lib/example.py",
           concurrent_run_limit=1,
           script_args={},
           num_of_dpus=1,  # This parameter is deprecated (from boto3). Use 
MaxCapacity instead on kwargs.
           aws_conn_id='aws_default',
           region_name="aws-region",
           s3_bucket="bucket-name", 
           iam_role_name="iam_role_name_here",
           create_job_kwargs={}
   )
   ```
   
   **Anything else we need to know**:
   
   **How often does this problem occur?** Every time id using local script
   
    I can take a stub at fixing it. I did notice the operator does not allow to 
update a glue job definition after its creation. boto3 offers an api to do so 
but it is not exposed in this operator 
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.update_job
 It would be great if I could add that as well, but might fall out of scope


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to