[ https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Micheal Ascah updated AIRFLOW-2771: ----------------------------------- Description: h2. Scenario S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or bad permissions). There are also no credentials found under ~/.aws/credentials for boto to fallback on. When poking for the key, it creates an S3Hook and calls `check_for_key` on the hook. If the call to HeadObject fails, the call is caught by a generic except clause that catches all exceptions, rather than the expected botocore.exceptions.ClientError when an object is not found. h2. Problem This causes the sensor to return False and report no issue with the task instance until it times out, rather than intuitively failing immediately if the connection is incorrectly configured. The current logging output gives no insight as to why the key is not being found. h4. Current code {code:python} try: self.get_conn().head_object(Bucket=bucket_name, Key=key) return True except: # <- This catches credential and connection exceptions that should be raised return False {code} {code:python} from airflow.hooks.S3_hook import S3Hook hook = S3Hook(aws_conn_id="conn_that_doesnt_exist") hook.check_for_key(key="test", bucket="test") False {code} {code:python} [2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 18:57:26,651] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt [2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): bucket.s3.amazonaws.com [2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 18:58:26,767] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt [2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): bucket.s3.amazonaws.com {code} h4. Expected h5. No credentials {code:python} from airflow.hooks.S3_hook import S3Hook hook = S3Hook(aws_conn_id="conn_that_doesnt_exist") hook.check_for_key(key="test", bucket="test") Traceback (most recent call last): ... botocore.exceptions.NoCredentialsError: Unable to locate credentials {code} h5. Good credentials {code:python} from airflow.hooks.S3_hook import S3Hook hook = S3Hook(aws_conn_id="conn_that_does_exist") hook.check_for_key(key="test", bucket="test") False {code} h4. Proposed Change Add a type to the except clause for botocore.exceptions.ClientError and log the message for both check_for_key and check_for_bucket on S3Hook. {code:python} try: self.get_conn().head_object(Bucket=bucket_name, Key=key) return True except ClientError as e: self.log.info(e.response["Error"]["Message"]) return False {code} was: h2. Scenario S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or bad permissions). There are also no credentials found under ~/.aws/credentials for boto to fallback on. When poking for the key, it creates an S3Hook and calls `check_for_key` on the hook. Currently, the call is caught by a generic except clause that catches all exceptions, rather than the expected botocore.exceptions.ClientError when an object is not found. h2. Problem This causes the sensor to return False and report no issue with the task instance until it times out, rather than intuitively failing immediately if the connection is incorrectly configured. The current logging output gives no insight as to why the key is not being found. h4. Current code {code:python} try: self.get_conn().head_object(Bucket=bucket_name, Key=key) return True except: # <- This catches credential and connection exceptions that should be raised return False {code} {code:python} from airflow.hooks.S3_hook import S3Hook hook = S3Hook(aws_conn_id="conn_that_doesnt_exist") hook.check_for_key(key="test", bucket="test") False {code} {code:python} [2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 18:57:26,651] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt [2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): bucket.s3.amazonaws.com [2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 18:58:26,767] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt [2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): bucket.s3.amazonaws.com {code} h4. Expected h5. No credentials {code:python} from airflow.hooks.S3_hook import S3Hook hook = S3Hook(aws_conn_id="conn_that_doesnt_exist") hook.check_for_key(key="test", bucket="test") Traceback (most recent call last): ... botocore.exceptions.NoCredentialsError: Unable to locate credentials {code} h5. Good credentials {code:python} from airflow.hooks.S3_hook import S3Hook hook = S3Hook(aws_conn_id="conn_that_does_exist") hook.check_for_key(key="test", bucket="test") False {code} h4. Proposed Change Add a type to the except clause for botocore.exceptions.ClientError and log the message for both check_for_key and check_for_bucket on S3Hook. {code:python} try: self.get_conn().head_object(Bucket=bucket_name, Key=key) return True except ClientError as e: self.log.info(e.response["Error"]["Message"]) return False {code} > S3Hook Broad Exception Silent Failure > ------------------------------------- > > Key: AIRFLOW-2771 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2771 > Project: Apache Airflow > Issue Type: Bug > Components: hooks > Affects Versions: 1.9.0 > Reporter: Micheal Ascah > Assignee: Micheal Ascah > Priority: Minor > Labels: S3Hook, S3Sensor > > h2. Scenario > S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or > bad permissions). There are also no credentials found under > ~/.aws/credentials for boto to fallback on. > > When poking for the key, it creates an S3Hook and calls `check_for_key` on > the hook. If the call to HeadObject fails, the call is caught by a generic > except clause that catches all exceptions, rather than the expected > botocore.exceptions.ClientError when an object is not found. > h2. Problem > This causes the sensor to return False and report no issue with the task > instance until it times out, rather than intuitively failing immediately if > the connection is incorrectly configured. The current logging output gives no > insight as to why the key is not being found. > h4. Current code > {code:python} > try: > self.get_conn().head_object(Bucket=bucket_name, Key=key) > return True > except: # <- This catches credential and connection exceptions that should > be raised > return False > {code} > {code:python} > from airflow.hooks.S3_hook import S3Hook > hook = S3Hook(aws_conn_id="conn_that_doesnt_exist") > hook.check_for_key(key="test", bucket="test") > False > {code} > {code:python} > [2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: > [2018-07-18 18:57:26,651] {sensors.py:537} INFO - Poking for key : > s3://bucket/key.txt > [2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: > [2018-07-18 18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS > connection (1): bucket.s3.amazonaws.com > [2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: > [2018-07-18 18:58:26,767] {sensors.py:537} INFO - Poking for key : > s3://bucket/key.txt > [2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: > [2018-07-18 18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS > connection (1): bucket.s3.amazonaws.com > {code} > h4. Expected > h5. No credentials > {code:python} > from airflow.hooks.S3_hook import S3Hook > hook = S3Hook(aws_conn_id="conn_that_doesnt_exist") > hook.check_for_key(key="test", bucket="test") > Traceback (most recent call last): > ... > botocore.exceptions.NoCredentialsError: Unable to locate credentials > {code} > h5. Good credentials > {code:python} > from airflow.hooks.S3_hook import S3Hook > hook = S3Hook(aws_conn_id="conn_that_does_exist") > hook.check_for_key(key="test", bucket="test") > False > {code} > h4. Proposed Change > Add a type to the except clause for botocore.exceptions.ClientError and log > the message for both check_for_key and check_for_bucket on S3Hook. > {code:python} > try: > self.get_conn().head_object(Bucket=bucket_name, Key=key) > return True > except ClientError as e: > self.log.info(e.response["Error"]["Message"]) > return False > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)