PS, if env vars are no longer supported for defining *any* Connections, the documentation <http://pythonhosted.org/airflow/configuration.html#connections> really ought to be updated:
*Connections in Airflow pipelines can be created using environment variables. The environment variable needs to have a prefix of AIRFLOW_CONN_ for Airflow with the value in a URI format to use the connection properly. Please see the Concepts <http://pythonhosted.org/airflow/concepts.html> documentation for more information on environment variables and connections.* On Mon, Jun 20, 2016 at 12:36 PM Tyrone Hinderson <[email protected]> wrote: > Thanks a lot, Jeremiah--this works for me. > > On Thu, Jun 16, 2016 at 2:48 PM Jeremiah Lowin <[email protected]> wrote: > >> Hi Tyrone, >> >> The motivation behind the change was to force *all* Airflow connections >> (including those used for logging) to go through the UI where they can be >> managed/controlled by an admin, and also to allow more fine-grained >> permissioning. >> >> Fortunately, connections can be created programmatically with just a >> couple >> extra steps. I use a script similar to this one (below) to set up all of >> the connections in our production environment after restarts. I've made >> some changes to show how the keys could be taken from env vars. You could >> run this script either as part of your own library or plugin. >> >> I hope this helps and I'm sorry for the inconvenience! >> >> >> >> import airflow >> import json >> from airflow.models import Connection >> >> S3_CONN_ID = 's3_connection' >> >> if __name__ == '__main__': >> session = airflow.settings.Session() >> >> # check if the connection exists >> s3_connection = ( >> session.query(Connection) >> .filter(Connection.conn_id == S3_CONN_ID) >> .one()) >> >> if not s3_connection: >> print('Creating connection: {}'.format(S3_CONN_ID)) >> session.add( >> Connection( >> conn_id=S3_CONN_ID, >> conn_type='s3', >> extra=json.dumps(dict( >> aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'], >> >> aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY']))) >> session.commit() >> print('Done creating connections.') >> >> >> On Thu, Jun 16, 2016 at 11:01 AM Tyrone Hinderson <[email protected] >> > >> wrote: >> >> > Hey Jacob, >> > >> > Thanks for your quick response. I doubt I can take your approach, >> because >> > >> > 1. It's imperative that the s3 connection be contained within an >> > environment variable >> > 2. My scheduler is deployed on an AWS box which uses an IAM role to >> > connect to s3, not a credentials file. >> > >> > However, can you tell me where you got the idea to use that particular >> > JSON? Might help with my quest for a solution. >> > >> > On Wed, Jun 15, 2016 at 8:00 PM Jakob Homan <[email protected]> wrote: >> > >> > > Hey Tyrone- >> > > I just set this up on 1.7.1.2 and found the documentation confusing >> > > too. Been meaning to improve the documentation. To get S3 logging >> > > configured I: >> > > >> > > (a) Set up an S3Connection (let's call it foo) with only the extra >> > > param set to the following json: >> > > >> > > { "s3_config_file": "/usr/local/airflow/.aws/credentials", >> > > "s3_config_format": "aws" } >> > > >> > > (b) Added a remote_log_conn_id key to the core section of airflow.cfg, >> > > with a value of "foo" (my S3Connection name) >> > > >> > > (c) Added a remote_base_log_folder key to the core section of >> > > airflow.cfg, with a value of "s3://where/i/put/my/logs" >> > > >> > > Everything worked after that. >> > > >> > > -Jakob >> > > >> > > On 15 June 2016 at 15:35, Tyrone Hinderson <[email protected]> >> > wrote: >> > > > @Jeremiah, >> > > > >> > > > http://pythonhosted.org/airflow/configuration.html#logs >> > > > >> > > > I used to log to s3 in 1.7.0, and my background .aws/credentials >> would >> > > take >> > > > care of authenticating in the background. Now it appears that I >> need to >> > > set >> > > > that "remote_log_conn_id" config field in order to continue logging >> to >> > s3 >> > > > in 1.7.1.2. Rather than create the connection in the web UI (afaik, >> > > > impractical to do programatically), I'd like to use an >> > > > "AIRFLOW_CONN_"-style env variable. I've tried an url like >> > > > s3://[access_key_id]:[secret_key]@[bucket].s3-[region]. >> amazonaws.com, >> > > but >> > > > that hasn't worked: >> > > > >> > > > ===================================== >> > > > [2016-06-15 21:40:26,583] {base_hook.py:53} INFO - Using connection >> to: >> > > > [bucket].s3-us-east-1.amazonaws.com < >> > http://s3-us-east-1.amazonaws.com/> >> > > > >> > > > [2016-06-15 21:40:26,583] {logging.py:57} ERROR - Could not create >> an >> > > > S3Hook with connection id "S3_LOGS". Please make sure that >> airflow[s3] >> > is >> > > > installed and the S3 connection exists. >> > > > >> > > > ===================================== >> > > > >> > > > It's clear that my connection exists because of the "Using >> connection >> > > to:" >> > > > line. However, I fear that my connection URI string is malformed. >> Can >> > you >> > > > provide some guidance as to how I might properly form an s3 >> connection >> > > URI, >> > > > since I mainly followed a mixture of wikipedia's URI format >> > > > <https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Examples >> > >> > > > and amazon's >> > > > s3 URI format >> > > > <http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html>? >> > > > >> > > > On Tue, May 24, 2016 at 6:03 PM Jeremiah Lowin <[email protected]> >> > > wrote: >> > > > >> > > >> Where are you seeing that an S3 connection is required? It will >> only >> > be >> > > >> accessed if you tols Airflow to send logs to S3. The config option >> can >> > > also >> > > >> be null (default) or a google storage location. >> > > >> >> > > >> The S3 connection is a standard Airflow connection. If you would >> like >> > > it to >> > > >> use environment variables or a boto config, it will -- but the >> > > connection >> > > >> object itself must be created in Airflow. See the S3 hook for >> details. >> > > >> >> > > >> >> > > >> On Tue, May 24, 2016 at 3:57 PM George Leslie-Waksman < >> > > >> [email protected]> wrote: >> > > >> >> > > >> > We ran into this issue as well. If you set the environment >> variable >> > to >> > > >> > anything random, it'll get ignored and control will pass through >> to >> > > >> > .aws/credentials >> > > >> > >> > > >> > We used "n/a" >> > > >> > >> > > >> > It's kind of annoying that the s3 connection is a) required, and >> b) >> > > >> poorly >> > > >> > supported as an env var. >> > > >> > >> > > >> > On Tue, May 24, 2016 at 8:37 AM Tyrone Hinderson < >> > > [email protected] >> > > >> > >> > > >> > wrote: >> > > >> > >> > > >> > > I was logging to S3 in 1.7.0, but now I need to create an S3 >> > > >> "Connection" >> > > >> > > in airflow (for remote_log_conn_id) to keep doing that in >> 1.7.1.2. >> > > >> Rather >> > > >> > > than set this "S3" connection in the UI, I'd like set a >> > > AIRFLOW_CONN_S3 >> > > >> > env >> > > >> > > variable. What does an airlfow-friendly s3 "connection string" >> > look >> > > >> like? >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> >
