Hello,

We are using the S3Hook in several of our airflow TI's to read and write
data from S3.

We are creating a s3 connection from the UI where we choose the below
options.

Conn Id - s3
Conn Type - S3
Extra - {"aws_access_key_id":"key", "aws_secret_access_key": "key"}

In pipeline code we use this connection as below,

s3 = S3Hook(s3_conn_id='s3')

We are looking into other options to define this connection as it is a
security issue to have the keys open like this. We tried defining the
connection id and connection type alone in the UI without the keys. In this
case, the tasks that read from S3 succeed but the ones that delete or
create files/objects fail with '403 Forbidden' error from S3. Did some
digging in the S3_Hook code and found that if the keys are not in the Extra
parameter then it would use the boto config but that doesn't seem to work
in my case for reasons I am unable to find.

All our other python scripts interact with S3 using the boto config on the
system without any problems.

1)
Need help on why the s3 hook isn't using the boto config. Am I missing to
pass some other parameters to this connection?

2)
How to define the s3 connection as environmental variable? We are
installing airflow via Chef and would want to have an environmental
variable like AIRFLOW_CONN_S3 created for this connection so that we don't
have to manually do it in the UI every time we run the setup.

Documentation says, it has the connection has to be in a URI format. On S3,
I could access different buckets with the same connection. But since it has
to be in URI format, does that mean i create one connection per bucket and
use it? Did not find any examples of this anywhere hence asking.

Thanks,
Nadeem

Reply via email to