[ 
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550683#comment-16550683
 ] 

ASF subversion and git services commented on AIRFLOW-2771:
----------------------------------------------------------

Commit 8e58053992aa8fd93d27283c8d97ff24491a68a8 in incubator-airflow's branch 
refs/heads/master from Mike Ascah
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8e58053 ]

[AIRFLOW-2771] Add except type to broad S3Hook try catch clauses

S3Hook will silently fail if given a conn_id that
does not exist. The
calls to check_for_key done by an S3KeySensor will
never fail if the
credentials object is not configured correctly.
This adds the expected
ClientError exception type when performing a HEAD
operation on an
object that doesn't exist to the try catch
statements so that other
exceptions are properly raised.

Closes #3616 from mascah/AIRFLOW-2771-S3hook-
except-type


> S3Hook Broad Exception Silent Failure
> -------------------------------------
>
>                 Key: AIRFLOW-2771
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2771
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>    Affects Versions: 1.9.0
>            Reporter: Micheal Ascah
>            Assignee: Micheal Ascah
>            Priority: Minor
>              Labels: S3Hook, S3Sensor
>             Fix For: 2.0.0
>
>
> h2. Scenario
> S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
> bad permissions). There are also no credentials found under 
> ~/.aws/credentials for boto to fallback on.
>  
> When poking for the key, it creates an S3Hook and calls `check_for_key` on 
> the hook. If the call to HeadObject fails, the call is caught by a generic 
> except clause that catches all exceptions, rather than the expected 
> botocore.exceptions.ClientError when an object is not found.
> h2. Problem
> This causes the sensor to return False and report no issue with the task 
> instance until it times out, rather than intuitively failing immediately if 
> the connection is incorrectly configured. The current logging output gives no 
> insight as to why the key is not being found.
> h4. Current code
> {code:python}
> try:
>     self.get_conn().head_object(Bucket=bucket_name, Key=key)
>     return True
> except:  # <- This catches credential and connection exceptions that should 
> be raised
>     return False
> {code}
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
> hook.check_for_key(key="test", bucket="test")
> False
> {code}
> {code:python}
> [2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:57:26,651] {sensors.py:537} INFO - Poking for key : 
> s3://bucket/key.txt
> [2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): bucket.s3.amazonaws.com
> [2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:58:26,767] {sensors.py:537} INFO - Poking for key : 
> s3://bucket/key.txt
> [2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): bucket.s3.amazonaws.com
> {code}
> h4. Expected
> h5. No credentials
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
> hook.check_for_key(key="test", bucket="test")
> Traceback (most recent call last):
> ...
> botocore.exceptions.NoCredentialsError: Unable to locate credentials
> {code}
> h5. Good credentials
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_does_exist")
> hook.check_for_key(key="test", bucket="test")
> False
> {code}
> h4. Proposed Change
> Add a type to the except clause for botocore.exceptions.ClientError and log 
> the message for both check_for_key and check_for_bucket on S3Hook.
> {code:python}
> try:
>     self.get_conn().head_object(Bucket=bucket_name, Key=key)
>     return True
> except ClientError as e:
>     self.log.info(e.response["Error"]["Message"]) 
>     return False
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to