Thanks Paul. Has anyone tried using native boto instead of s3hook in airflow tasks (python callable)? I tried using it but get a "S3ResponseError: 403 Forbidden". Just wondering if we are restricted to using only s3hook. Reason i want to use native boto is to avoid defining the connection for the s3hook.
Thanks, Nadeem On Tue, Jul 19, 2016 at 10:47 AM, Paul Minton <[email protected]> wrote: > For 2) we're using something similar to > https://gist.github.com/syvineckruyk/d2c96b418ed509a174e0e718cb62b20a to > programmatically load Connection and Variable objects. Those scripts run > with each chef client run. > > On Tue, Jul 19, 2016 at 10:04 AM, Nadeem Ahmed Nazeer <[email protected] > > > wrote: > > > Thanks for the response Paul. Crypto package would be my last resort. But > > it would still not solve (2) where I am looking to create the connection > > automatically. > > > > Awaiting further answers.. > > > > Thanks, > > Nadeem > > > > On Tue, Jul 19, 2016 at 9:03 AM, Paul Minton <[email protected]> wrote: > > > > > Nadeem, this doesn't directly answer either 1) or 2), but have you > > > considered using the option "is_exrra_encrypted"? This encrypts the > extra > > > json as it would for the rest of the credentials on the connection > object > > > (ie using a fernet key and the encryption package) > > > > > > On Mon, Jul 18, 2016 at 10:00 PM, Nadeem Ahmed Nazeer < > > [email protected] > > > > > > > wrote: > > > > > > > Hi, > > > > > > > > Appreciate if someone could please provide assistance on this. > > > > > > > > Thanks, > > > > Nadeem > > > > > > > > On Fri, Jul 15, 2016 at 4:15 PM, Nadeem Ahmed Nazeer < > > > [email protected]> > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > We are using the S3Hook in several of our airflow TI's to read and > > > write > > > > > data from S3. > > > > > > > > > > We are creating a s3 connection from the UI where we choose the > below > > > > > options. > > > > > > > > > > Conn Id - s3 > > > > > Conn Type - S3 > > > > > Extra - {"aws_access_key_id":"key", "aws_secret_access_key": "key"} > > > > > > > > > > In pipeline code we use this connection as below, > > > > > > > > > > s3 = S3Hook(s3_conn_id='s3') > > > > > > > > > > We are looking into other options to define this connection as it > is > > a > > > > > security issue to have the keys open like this. We tried defining > the > > > > > connection id and connection type alone in the UI without the keys. > > In > > > > this > > > > > case, the tasks that read from S3 succeed but the ones that delete > or > > > > > create files/objects fail with '403 Forbidden' error from S3. Did > > some > > > > > digging in the S3_Hook code and found that if the keys are not in > the > > > > Extra > > > > > parameter then it would use the boto config but that doesn't seem > to > > > work > > > > > in my case for reasons I am unable to find. > > > > > > > > > > All our other python scripts interact with S3 using the boto config > > on > > > > the > > > > > system without any problems. > > > > > > > > > > 1) > > > > > Need help on why the s3 hook isn't using the boto config. Am I > > missing > > > to > > > > > pass some other parameters to this connection? > > > > > > > > > > 2) > > > > > How to define the s3 connection as environmental variable? We are > > > > > installing airflow via Chef and would want to have an environmental > > > > > variable like AIRFLOW_CONN_S3 created for this connection so that > we > > > > don't > > > > > have to manually do it in the UI every time we run the setup. > > > > > > > > > > Documentation says, it has the connection has to be in a URI > format. > > On > > > > > S3, I could access different buckets with the same connection. But > > > since > > > > it > > > > > has to be in URI format, does that mean i create one connection per > > > > bucket > > > > > and use it? Did not find any examples of this anywhere hence > asking. > > > > > > > > > > Thanks, > > > > > Nadeem > > > > > > > > > > > > > > >
