PS, if env vars are no longer supported for defining *any* Connections, the
documentation
<http://pythonhosted.org/airflow/configuration.html#connections> really
ought to be updated:

*Connections in Airflow pipelines can be created using environment
variables. The environment variable needs to have a prefix
of AIRFLOW_CONN_ for Airflow with the value in a URI format to use the
connection properly. Please see the Concepts
<http://pythonhosted.org/airflow/concepts.html> documentation for more
information on environment variables and connections.*

On Mon, Jun 20, 2016 at 12:36 PM Tyrone Hinderson <[email protected]>
wrote:

> Thanks a lot, Jeremiah--this works for me.
>
> On Thu, Jun 16, 2016 at 2:48 PM Jeremiah Lowin <[email protected]> wrote:
>
>> Hi Tyrone,
>>
>> The motivation behind the change was to force *all* Airflow connections
>> (including those used for logging) to go through the UI where they can be
>> managed/controlled by an admin, and also to allow more fine-grained
>> permissioning.
>>
>> Fortunately, connections can be created programmatically with just a
>> couple
>> extra steps. I use a script similar to this one (below) to set up all of
>> the connections in our production environment after restarts. I've made
>> some changes to show how the keys could be taken from env vars. You could
>> run this script either as part of your own library or plugin.
>>
>> I hope this helps and I'm sorry for the inconvenience!
>>
>>
>>
>> import airflow
>> import json
>> from airflow.models import Connection
>>
>> S3_CONN_ID = 's3_connection'
>>
>> if __name__ == '__main__':
>>     session = airflow.settings.Session()
>>
>>     # check if the connection exists
>>     s3_connection = (
>>         session.query(Connection)
>>         .filter(Connection.conn_id == S3_CONN_ID)
>>         .one())
>>
>>     if not s3_connection:
>>         print('Creating connection: {}'.format(S3_CONN_ID))
>>         session.add(
>>             Connection(
>>                 conn_id=S3_CONN_ID,
>>                 conn_type='s3',
>>                 extra=json.dumps(dict(
>>                     aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
>>
>> aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'])))
>>         session.commit()
>>         print('Done creating connections.')
>>
>>
>> On Thu, Jun 16, 2016 at 11:01 AM Tyrone Hinderson <[email protected]
>> >
>> wrote:
>>
>> > Hey Jacob,
>> >
>> > Thanks for your quick response. I doubt I can take your approach,
>> because
>> >
>> >    1. It's imperative that the s3 connection be contained within an
>> >    environment variable
>> >    2. My scheduler is deployed on an AWS box which uses an IAM role to
>> >    connect to s3, not a credentials file.
>> >
>> > However, can you tell me where you got the idea to use that particular
>> > JSON? Might help with my quest for a solution.
>> >
>> > On Wed, Jun 15, 2016 at 8:00 PM Jakob Homan <[email protected]> wrote:
>> >
>> > > Hey Tyrone-
>> > >    I just set this up on 1.7.1.2 and found the documentation confusing
>> > > too.  Been meaning to improve the documentation.  To get S3 logging
>> > > configured I:
>> > >
>> > > (a) Set up an S3Connection (let's call it foo) with only the extra
>> > > param set to the following json:
>> > >
>> > > { "s3_config_file": "/usr/local/airflow/.aws/credentials",
>> > > "s3_config_format": "aws" }
>> > >
>> > > (b) Added a remote_log_conn_id key to the core section of airflow.cfg,
>> > > with a value of "foo" (my S3Connection name)
>> > >
>> > > (c) Added a remote_base_log_folder key to the core section of
>> > > airflow.cfg, with a value of "s3://where/i/put/my/logs"
>> > >
>> > > Everything worked after that.
>> > >
>> > > -Jakob
>> > >
>> > > On 15 June 2016 at 15:35, Tyrone Hinderson <[email protected]>
>> > wrote:
>> > > > @Jeremiah,
>> > > >
>> > > > http://pythonhosted.org/airflow/configuration.html#logs
>> > > >
>> > > > I used to log to s3 in 1.7.0, and my background .aws/credentials
>> would
>> > > take
>> > > > care of authenticating in the background. Now it appears that I
>> need to
>> > > set
>> > > > that "remote_log_conn_id" config field in order to continue logging
>> to
>> > s3
>> > > > in 1.7.1.2. Rather than create the connection in the web UI (afaik,
>> > > > impractical to do programatically), I'd like to use an
>> > > > "AIRFLOW_CONN_"-style env variable. I've tried an url like
>> > > > s3://[access_key_id]:[secret_key]@[bucket].s3-[region].
>> amazonaws.com,
>> > > but
>> > > > that hasn't worked:
>> > > >
>> > > > =====================================
>> > > > [2016-06-15 21:40:26,583] {base_hook.py:53} INFO - Using connection
>> to:
>> > > > [bucket].s3-us-east-1.amazonaws.com <
>> > http://s3-us-east-1.amazonaws.com/>
>> > > >
>> > > > [2016-06-15 21:40:26,583] {logging.py:57} ERROR - Could not create
>> an
>> > > > S3Hook with connection id "S3_LOGS". Please make sure that
>> airflow[s3]
>> > is
>> > > > installed and the S3 connection exists.
>> > > >
>> > > > =====================================
>> > > >
>> > > > It's clear that my connection exists because of the "Using
>> connection
>> > > to:"
>> > > > line. However, I fear that my connection URI string is malformed.
>> Can
>> > you
>> > > > provide some guidance as to how I might properly form an s3
>> connection
>> > > URI,
>> > > > since I mainly followed a mixture of wikipedia's URI format
>> > > > <https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Examples
>> >
>> > > > and amazon's
>> > > > s3 URI format
>> > > > <http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html>?
>> > > >
>> > > > On Tue, May 24, 2016 at 6:03 PM Jeremiah Lowin <[email protected]>
>> > > wrote:
>> > > >
>> > > >> Where are you seeing that an S3 connection is required? It will
>> only
>> > be
>> > > >> accessed if you tols Airflow to send logs to S3. The config option
>> can
>> > > also
>> > > >> be null (default) or a google storage location.
>> > > >>
>> > > >> The S3 connection is a standard Airflow connection. If you would
>> like
>> > > it to
>> > > >> use environment variables or a boto config, it will -- but the
>> > > connection
>> > > >> object itself must be created in Airflow. See the S3 hook for
>> details.
>> > > >>
>> > > >>
>> > > >> On Tue, May 24, 2016 at 3:57 PM George Leslie-Waksman <
>> > > >> [email protected]> wrote:
>> > > >>
>> > > >> > We ran into this issue as well. If you set the environment
>> variable
>> > to
>> > > >> > anything random, it'll get ignored and control will pass through
>> to
>> > > >> > .aws/credentials
>> > > >> >
>> > > >> > We used "n/a"
>> > > >> >
>> > > >> > It's kind of annoying that the s3 connection is a) required, and
>> b)
>> > > >> poorly
>> > > >> > supported as an env var.
>> > > >> >
>> > > >> > On Tue, May 24, 2016 at 8:37 AM Tyrone Hinderson <
>> > > [email protected]
>> > > >> >
>> > > >> > wrote:
>> > > >> >
>> > > >> > > I was logging to S3 in 1.7.0, but now I need to create an S3
>> > > >> "Connection"
>> > > >> > > in airflow (for remote_log_conn_id) to keep doing that in
>> 1.7.1.2.
>> > > >> Rather
>> > > >> > > than set this "S3" connection in the UI, I'd like set a
>> > > AIRFLOW_CONN_S3
>> > > >> > env
>> > > >> > > variable. What does an airlfow-friendly s3 "connection string"
>> > look
>> > > >> like?
>> > > >> > >
>> > > >> >
>> > > >>
>> > >
>> >
>>
>

Reply via email to