[
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Micheal Ascah updated AIRFLOW-2771:
-----------------------------------
Fix Version/s: 1.10.0
> S3Hook Broad Exception Silent Failure
> -------------------------------------
>
> Key: AIRFLOW-2771
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2771
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks
> Affects Versions: 1.9.0
> Reporter: Micheal Ascah
> Assignee: Micheal Ascah
> Priority: Minor
> Labels: S3Hook, S3Sensor
> Fix For: 1.10.0, 2.0.0
>
>
> h2. Scenario
> S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or
> bad permissions). There are also no credentials found under
> ~/.aws/credentials for boto to fallback on.
>
> When poking for the key, it creates an S3Hook and calls `check_for_key` on
> the hook. If the call to HeadObject fails, the call is caught by a generic
> except clause that catches all exceptions, rather than the expected
> botocore.exceptions.ClientError when an object is not found.
> h2. Problem
> This causes the sensor to return False and report no issue with the task
> instance until it times out, rather than intuitively failing immediately if
> the connection is incorrectly configured. The current logging output gives no
> insight as to why the key is not being found.
> h4. Current code
> {code:python}
> try:
> self.get_conn().head_object(Bucket=bucket_name, Key=key)
> return True
> except: # <- This catches credential and connection exceptions that should
> be raised
> return False
> {code}
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
> hook.check_for_key(key="test", bucket="test")
> False
> {code}
> {code:python}
> [2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask:
> [2018-07-18 18:57:26,651] {sensors.py:537} INFO - Poking for key :
> s3://bucket/key.txt
> [2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask:
> [2018-07-18 18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS
> connection (1): bucket.s3.amazonaws.com
> [2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask:
> [2018-07-18 18:58:26,767] {sensors.py:537} INFO - Poking for key :
> s3://bucket/key.txt
> [2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask:
> [2018-07-18 18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS
> connection (1): bucket.s3.amazonaws.com
> {code}
> h4. Expected
> h5. No credentials
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
> hook.check_for_key(key="test", bucket="test")
> Traceback (most recent call last):
> ...
> botocore.exceptions.NoCredentialsError: Unable to locate credentials
> {code}
> h5. Good credentials
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_does_exist")
> hook.check_for_key(key="test", bucket="test")
> False
> {code}
> h4. Proposed Change
> Add a type to the except clause for botocore.exceptions.ClientError and log
> the message for both check_for_key and check_for_bucket on S3Hook.
> {code:python}
> try:
> self.get_conn().head_object(Bucket=bucket_name, Key=key)
> return True
> except ClientError as e:
> self.log.info(e.response["Error"]["Message"])
> return False
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)