Hi Dan, I discussed this a little bit with one of the security architects here. We think that you can have a fair trade off between security and usability by having a kind of manifest with the dag you are submitting. This manifest can then specify what the generated tasks/dags are allowed to do and what metadata to provide to them. We could also let the scheduler generate hashes per generated DAG / task and verify those with an established version (1st run?). This limits the attack vector.
A DagSerializer would be great, but I think it solves a different issue and the above is somewhat simpler to implement? Bolke > On 29 Jul 2018, at 23:47, Dan Davydov <ddavy...@twitter.com.INVALID> wrote: > > *Let’s say we trust the owner field of the DAGs I think we could do the > following.* > *Obviously, the trusting the user part is key here. It is one of the > reasons I was suggesting using “airflow submit” to update / add dags in > Airflow* > > > *This is the hard part about my question.* > I think in a true multi-tenant environment we wouldn't be able to trust the > user, otherwise we wouldn't necessarily even need a mapping of Airflow DAG > users to secrets, because if we trust users to set the correct Airflow user > for DAGs, we are basically trusting them with all of the creds the Airflow > scheduler can access for all users anyways. > > I actually had the same thought as your "airflow submit" a while ago, which > I discussed with Alex, basically creating an API for adding DAGs instead of > having the Scheduler parse them. FWIW I think it's superior to the git time > machine approach because it's a more generic form of "serialization" and is > more correct as well because the same DAG file parsed on a given git SHA > can produce different DAGs. Let me know what you think, and maybe I can > start a more formal design doc if you are onboard: > > A user or service with an auth token sends an "airflow submit" request to a > new kind of Dag Serialization service, along with the serialized DAG > objects generated by parsing on the client. It's important that these > serialized objects are declaritive and not e.g. pickles so that the > scheduler/workers can consume them and reproducability of the DAGs is > guaranteed. The service will then store each generated DAG along with it's > access based on the provided token e.g. using Ranger, and the > scheduler/workers will use the stored DAGs for scheduling/execution. > Operators would be deployed along with the Airflow code separately from the > serialized DAGs. > > A serialed DAG would look something like this (basically Luigi-style :)): > MyTask - BashOperator: { > cmd: "sleep 1" > user: "Foo" > access: "token1", "token2" > } > > MyDAG: { > MyTask1 >> SomeOtherTask1 > MyTask2 >> SomeOtherTask1 > } > > Dynamic DAGs in this case would just consist of a service calling "Airflow > Submit" that does it's own form of authentication to get access to some > kind of tokens (or basically just forwarding the secrets the users of the > dynamic DAG submit). > > For the default Airflow implementation you can maybe just have the Dag > Serialization server bundled with the Scheduler, with auth turned off, and > to periodically update the Dag Serialization store which would emulate the > current behavior closely. > > Pros: > 1. Consistency across running task instances in a dagrun/scheduler, > reproducability and auditability of DAGs > 2. Users can control when to deploy their DAGs > 3. Scheduler runs much faster since it doesn't have to run python files and > e.g. make network calls > 4. Scaling scheduler becomes easier because can have different service > responsible for parsing DAGs which can be trivially scaled horizontally > (clients are doing the parsing) > 5. Potentially makes creating ad-hoc DAGs/backfilling/iterating on DAGs > easier? e.g. can use the Scheduler itself to schedule backfills with a > slightly modified serialized version of a DAG. > > Cons: > 1. Have to deprecate a lot of popular features, e.g. allowing custom > callbacks in operators (e.g. on_failure), and jinja_templates > 2. Version compatibility problems, e.g. user/service client might be > serializing arguments for hooks/operators that have been deprecated in > newer versions of the hooks, or the serialized DAG schema changes and old > DAGs aren't automatically updated. Might want to have some kind of > versioning system for serialized DAGs to at least ensure that stored DAGs > are valid when the Scheduler/Worker/etc are upgraded, maybe something > similar to thrift/protobuf versioning. > 3. Additional complexity - additional service, logic on workers/scheduler > to fetch/cache serialized DAGs efficiently, expiring/archiving old DAG > definitions, etc > > > On Sun, Jul 29, 2018 at 3:20 PM Bolke de Bruin <bdbr...@gmail.com > <mailto:bdbr...@gmail.com>> wrote: > >> Ah gotcha. That’s another issue actually (but related). >> >> Let’s say we trust the owner field of the DAGs I think we could do the >> following. We then have a table (and interface) to tell Airflow what users >> have access to what connections. The scheduler can then check if the task >> in the dag can access the conn_id it is asking for. Auto generated dags >> still have an owner (or should) and therefore should be fine. Some >> integrity checking could/should be added as we want to be sure that the >> task we schedule is the task we launch. So a signature calculated at the >> scheduler (or part of the DAG), send as part of the metadata and checked by >> the executor is probably smart. >> >> You can also make this more fancy by integrating with something like >> Apache Ranger that allows for policy checking. >> >> Obviously, the trusting the user part is key here. It is one of the >> reasons I was suggesting using “airflow submit” to update / add dags in >> Airflow. We could enforce authentication on the DAG. It was kind of ruled >> out in favor of git time machines although these never happened afaik ;-). >> >> BTW: I have updated my implementation with protobuf. Metadata is now >> available at executor and task. >> >> >>> On 29 Jul 2018, at 15:47, Dan Davydov <ddavy...@twitter.com.INVALID> >> wrote: >>> >>> The concern is how to secure secrets on the scheduler such that only >>> certain DAGs can access them, and in the case of files that create DAGs >>> dynamically, only some set of DAGs should be able to access these >> secrets. >>> >>> e.g. if there is a secret/keytab that can be read by DAG A generated by >>> file X, and file X generates DAG B as well, there needs to be a scheme to >>> stop the parsing of DAG B on the scheduler from being able to read the >>> secret in DAG A. >>> >>> Does that make sense? >>> >>> On Sun, Jul 29, 2018 at 6:14 AM Bolke de Bruin <bdbr...@gmail.com >> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> wrote: >>> >>>> I’m not sure what you mean. The example I created allows for dynamic >> DAGs, >>>> as the scheduler obviously knows about the tasks when they are ready to >> be >>>> scheduled. >>>> This isn’t any different from a static DAG or a dynamic one. >>>> >>>> For Kerberos it isnt that special. Basically a keytab are the revokable >>>> users credentials >>>> in a special format. The keytab itself can be protected by a password. >> So >>>> I can imagine >>>> that a connection is defined that sets a keytab location and password to >>>> access the keytab. >>>> The scheduler understands this (or maybe the Connection model) and >>>> serializes and sends >>>> it to the worker as part of the metadata. The worker then reconstructs >> the >>>> keytab and issues >>>> a kinit or supplies it to the other service requiring it (eg. Spark) >>>> >>>> * Obviously the worker and scheduler need to communicate over SSL. >>>> * There is a challenge at the worker level. Credentials are secured >>>> against other users, but are readable by the owning user. So imagine 2 >> DAGs >>>> from two different users with different connections without sudo >>>> configured. If they end up at the same worker if DAG 2 is malicious it >>>> could read files and memory created by DAG 1. This is the reason why >> using >>>> environment variables are NOT safe (DAG 2 could read >> /proc/<pid>/environ). >>>> To mitigate this we probably need to PIPE the data to the task’s STDIN. >> It >>>> won’t solve the issue but will make it harder as now it will only be in >>>> memory. >>>> * The reconstructed keytab (or the initalized version) can be stored in, >>>> most likely, the process-keyring ( >>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html >>>> <http://man7.org/linux/man-pages/man7/process-keyring.7.html> < >>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html >>>> <http://man7.org/linux/man-pages/man7/process-keyring.7.html> < >> http://man7.org/linux/man-pages/man7/process-keyring.7.html >> <http://man7.org/linux/man-pages/man7/process-keyring.7.html>>>). As >>>> mentioned earlier this poses a challenge for Java applications that >> cannot >>>> read from this location (keytab an ccache). Writing it out to the >>>> filesystem then becomes a possibility. This is essentially the same how >>>> Spark solves it ( >>>> https://spark.apache.org/docs/latest/security.html#yarn-mode >>>> <https://spark.apache.org/docs/latest/security.html#yarn-mode> < >> https://spark.apache.org/docs/latest/security.html#yarn-mode >> <https://spark.apache.org/docs/latest/security.html#yarn-mode>> < >>>> https://spark.apache.org/docs/latest/security.html#yarn-mode >>>> <https://spark.apache.org/docs/latest/security.html#yarn-mode> < >> https://spark.apache.org/docs/latest/security.html#yarn-mode >> <https://spark.apache.org/docs/latest/security.html#yarn-mode>>>). >>>> >>>> Why not work on this together? We need it as well. Airflow as it is now >> we >>>> consider the biggest security threat and it is really hard to secure it. >>>> The above would definitely be a serious improvement. Another step would >> be >>>> to stop Tasks from accessing the Airflow DB all together. >>>> >>>> Cheers >>>> Bolke >>>> >>>>> On 29 Jul 2018, at 05:36, Dan Davydov <ddavy...@twitter.com.INVALID >>>>> <mailto:ddavy...@twitter.com.INVALID> >> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>>> >>>> wrote: >>>>> >>>>> This makes sense, and thanks for putting this together. I might pick >> this >>>>> up myself depending on if we can get the rest of the mutli-tenancy >> story >>>>> nailed down, but I still think the tricky part is figuring out how to >>>> allow >>>>> dynamic DAGs (e.g. DAGs created from rows in a Mysql table) to work >> with >>>>> Kerberos, curious what your thoughts are there. How would secrets be >>>> passed >>>>> securely in a multi-tenant Scheduler starting from parsing the DAGs up >> to >>>>> the executor sending them off? >>>>> >>>>> On Sat, Jul 28, 2018 at 5:07 PM Bolke de Bruin <bdbr...@gmail.com >>>>> <mailto:bdbr...@gmail.com> >> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>>> wrote: >>>>> >>>>>> Here: >>>>>> >>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>>>>> <https://github.com/bolkedebruin/airflow/tree/secure_connections> < >> https://github.com/bolkedebruin/airflow/tree/secure_connections >> <https://github.com/bolkedebruin/airflow/tree/secure_connections>> < >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>>> <https://github.com/bolkedebruin/airflow/tree/secure_connections> < >> https://github.com/bolkedebruin/airflow/tree/secure_connections >> <https://github.com/bolkedebruin/airflow/tree/secure_connections>>> < >>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>>>>> <https://github.com/bolkedebruin/airflow/tree/secure_connections> < >> https://github.com/bolkedebruin/airflow/tree/secure_connections >> <https://github.com/bolkedebruin/airflow/tree/secure_connections>> < >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>>> <https://github.com/bolkedebruin/airflow/tree/secure_connections> < >> https://github.com/bolkedebruin/airflow/tree/secure_connections >> <https://github.com/bolkedebruin/airflow/tree/secure_connections>>>> >>>>>> >>>>>> Is a working rudimentary implementation that allows securing the >>>>>> connections (only LocalExecutor at the moment) >>>>>> >>>>>> * It enforces the use of “conn_id” instead of the mix that we have now >>>>>> * A task if using “conn_id” has ‘auto-registered’ (which is a noop) >> its >>>>>> connections >>>>>> * The scheduler reads the connection informations and serializes it to >>>>>> json (which should be a different format, protobuf preferably) >>>>>> * The scheduler then sends this info to the executor >>>>>> * The executor puts this in the environment of the task (environment >>>> most >>>>>> likely not secure enough for us) >>>>>> * The BaseHook reads out this environment variable and does not need >> to >>>>>> touch the database >>>>>> >>>>>> The example_http_operator works, I havent tested any other. To make it >>>>>> work I just adjusted the hook and operator to use “conn_id” instead >>>>>> of the non standard http_conn_id. >>>>>> >>>>>> Makes sense? >>>>>> >>>>>> B. >>>>>> >>>>>> * The BaseHook is adjusted to not connect to the database >>>>>>> On 28 Jul 2018, at 17:50, Bolke de Bruin <bdbr...@gmail.com >>>>>>> <mailto:bdbr...@gmail.com> <mailto: >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> wrote: >>>>>>> >>>>>>> Well, I don’t think a hook (or task) should be obtain it by itself. >> It >>>>>> should be supplied. >>>>>>> At the moment you start executing the task you cannot trust it >> anymore >>>>>> (ie. it is unmanaged >>>>>>> / non airflow code). >>>>>>> >>>>>>> So we could change the basehook to understand supplied credentials >> and >>>>>> populate >>>>>>> a hash with “conn_ids”. Hooks normally call BaseHook.get_connection >>>>>> anyway, so >>>>>>> it shouldnt be too hard and should in principle not require changes >> to >>>>>> the hooks >>>>>>> themselves if they are well behaved. >>>>>>> >>>>>>> B. >>>>>>> >>>>>>>> On 28 Jul 2018, at 17:41, Dan Davydov <ddavy...@twitter.com.INVALID >>>>>>>> <mailto:ddavy...@twitter.com.INVALID> >> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> >>>>>> <mailto:ddavy...@twitter.com.INVALID >>>>>> <mailto:ddavy...@twitter.com.INVALID> <mailto: >> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> <mailto: >>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >>>> <mailto:ddavy...@twitter.com.INVALID >>>> <mailto:ddavy...@twitter.com.INVALID>>>>> >> wrote: >>>>>>>> >>>>>>>> *So basically in the scheduler we parse the dag. Either from the >>>>>> manifest >>>>>>>> (new) or from smart parsing (probably harder, maybe some auto >>>>>> register?) we >>>>>>>> know what connections and keytabs are available dag wide or per >> task.* >>>>>>>> This is the hard part that I was curious about, for dynamically >>>> created >>>>>>>> DAGs, e.g. those generated by reading tasks in a MySQL database or a >>>>>> json >>>>>>>> file, there isn't a great way to do this. >>>>>>>> >>>>>>>> I 100% agree with deprecating the connections table (at least for >> the >>>>>>>> secure option). The main work there is rewriting all hooks to take >>>>>>>> credentials from arbitrary data sources by allowing a customized >>>>>>>> CredentialsReader class. Although hooks are technically private, I >>>>>> think a >>>>>>>> lot of companies depend on them so the PMC should probably discuss >> if >>>>>> this >>>>>>>> is an Airflow 2.0 change or not. >>>>>>>> >>>>>>>> On Fri, Jul 27, 2018 at 5:24 PM Bolke de Bruin <bdbr...@gmail.com >>>>>>>> <mailto:bdbr...@gmail.com> >> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> >>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> >>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: >> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto:bdbr...@gmail.com >> <mailto:bdbr...@gmail.com>>>>> wrote: >>>>>>>> >>>>>>>>> Sure. In general I consider keytabs as a part of connection >>>>>> information. >>>>>>>>> Connections should be secured by sending the connection >> information a >>>>>> task >>>>>>>>> needs as part of information the executor gets. A task should then >>>> not >>>>>> need >>>>>>>>> access to the connection table in Airflow. Keytabs could then be >> send >>>>>> as >>>>>>>>> part of the connection information (base64 encoded) and setup by >> the >>>>>>>>> executor (this key) to be read only to the task it is launching. >>>>>>>>> >>>>>>>>> So basically in the scheduler we parse the dag. Either from the >>>>>> manifest >>>>>>>>> (new) or from smart parsing (probably harder, maybe some auto >>>>>> register?) we >>>>>>>>> know what connections and keytabs are available dag wide or per >> task. >>>>>>>>> >>>>>>>>> The credentials and connection information then are serialized >> into a >>>>>>>>> protobuf message and send to the executor as part of the “queue” >>>>>> action. >>>>>>>>> The worker then deserializes the information and makes it securely >>>>>>>>> available to the task (which is quite hard btw). >>>>>>>>> >>>>>>>>> On that last bit making the info securely available might be >> storing >>>>>> it in >>>>>>>>> the Linux KEYRING (supported by python keyring). Keytabs will be >>>> tough >>>>>> to >>>>>>>>> do properly due to Java not properly supporting KEYRING and only >>>> files >>>>>> and >>>>>>>>> these are hard to make secure (due to the possibility a process >> will >>>>>> list >>>>>>>>> all files in /tmp and get credentials through that). Maybe storing >>>> the >>>>>>>>> keytab with a password and having the password in the KEYRING might >>>>>> work. >>>>>>>>> Something to find out. >>>>>>>>> >>>>>>>>> B. >>>>>>>>> >>>>>>>>> Verstuurd vanaf mijn iPad >>>>>>>>> >>>>>>>>>> Op 27 jul. 2018 om 22:04 heeft Dan Davydov >>>>>> <ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >>>>>> <mailto:ddavy...@twitter.com.INVALID >>>>>> <mailto:ddavy...@twitter.com.INVALID>> >> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >>>> >>>> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >>>> <mailto: >> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> >> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> >>>>>>> >>>>>>>>> het volgende geschreven: >>>>>>>>>> >>>>>>>>>> I'm curious if you had any ideas in terms of ideas to enable >>>>>>>>> multi-tenancy >>>>>>>>>> with respect to Kerberos in Airflow. >>>>>>>>>> >>>>>>>>>>> On Fri, Jul 27, 2018 at 2:38 PM Bolke de Bruin < >> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto:bdbr...@gmail.com >> <mailto:bdbr...@gmail.com>> >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> >>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> >>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: >> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto:bdbr...@gmail.com >> <mailto:bdbr...@gmail.com>>>>> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Cool. The doc will need some refinement as it isn't entirely >>>>>> accurate. >>>>>>>>> In >>>>>>>>>>> addition we need to separate between Airflow as a client of >>>>>> kerberized >>>>>>>>>>> services (this is what is talked about in the astronomer doc) vs >>>>>>>>>>> kerberizing airflow itself, which the API supports. >>>>>>>>>>> >>>>>>>>>>> In general to access kerberized services (airflow as a client) >> one >>>>>> needs >>>>>>>>>>> to start the ticket renewer with a valid keytab. For the hooks it >>>>>> isn't >>>>>>>>>>> always required to change the hook to support it. Hadoop cli >> tools >>>>>> often >>>>>>>>>>> just pick it up as their client config is set to do so. Then >>>> another >>>>>>>>> class >>>>>>>>>>> is there for HTTP-like services which are accessed by urllib >> under >>>>>> the >>>>>>>>>>> hood, these typically use SPNEGO. These often need to be adjusted >>>> as >>>>>> it >>>>>>>>>>> requires some urllib config. Finally, there are protocols which >> use >>>>>> SASL >>>>>>>>>>> with kerberos. Like HDFS (not webhdfs, that uses SPNEGO). These >>>>>> require >>>>>>>>> per >>>>>>>>>>> protocol implementations. >>>>>>>>>>> >>>>>>>>>>> From the top of my head we support kerberos client side now with: >>>>>>>>>>> >>>>>>>>>>> * Spark >>>>>>>>>>> * HDFS (snakebite python 2.7, cli and with the upcoming libhdfs >>>>>>>>>>> implementation) >>>>>>>>>>> * Hive (not metastore afaik) >>>>>>>>>>> >>>>>>>>>>> Two things to remember: >>>>>>>>>>> >>>>>>>>>>> * If a job (ie. Spark job) will finish later than the maximum >>>> ticket >>>>>>>>>>> lifetime you probably need to provide a keytab to said >> application. >>>>>>>>>>> Otherwise you will get failures after the expiry. >>>>>>>>>>> * A keytab (used by the renewer) are credentials (user and pass) >> so >>>>>> jobs >>>>>>>>>>> are executed under the keytab in use at that moment >>>>>>>>>>> * Securing keytab in multi tenancy airflow is a challenge. This >>>> also >>>>>>>>> goes >>>>>>>>>>> for securing connections. This we need to fix at some point. >>>> Solution >>>>>>>>> for >>>>>>>>>>> now seems to be no multi tenancy. >>>>>>>>>>> >>>>>>>>>>> Kerberos seems harder than it is btw. Still, we are sometimes >>>> moving >>>>>>>>> away >>>>>>>>>>> from it to OAUTH2 based authentication. This gets use closer to >>>> cloud >>>>>>>>>>> standards (but we are on prem) >>>>>>>>>>> >>>>>>>>>>> B. >>>>>>>>>>> >>>>>>>>>>> Sent from my iPhone >>>>>>>>>>> >>>>>>>>>>>> On 27 Jul 2018, at 17:41, Hitesh Shah <hit...@apache.org >>>>>>>>>>>> <mailto:hit...@apache.org> >> <mailto:hit...@apache.org <mailto:hit...@apache.org>> <mailto: >>>> hit...@apache.org <mailto:hit...@apache.org> <mailto:hit...@apache.org >>>> <mailto:hit...@apache.org>>> <mailto: >>>>>> hit...@apache.org <mailto:hit...@apache.org> <mailto:hit...@apache.org >>>>>> <mailto:hit...@apache.org>> <mailto: >> hit...@apache.org <mailto:hit...@apache.org> <mailto:hit...@apache.org >> <mailto:hit...@apache.org>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Taylor >>>>>>>>>>>> >>>>>>>>>>>> +1 on upstreaming this. It would be great if you can submit a >> pull >>>>>>>>>>> request >>>>>>>>>>>> to enhance the apache airflow docs. >>>>>>>>>>>> >>>>>>>>>>>> thanks >>>>>>>>>>>> Hitesh >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 26, 2018 at 2:32 PM Taylor Edmiston < >>>>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> >>>>>> <mailto:tedmis...@gmail.com <mailto:tedmis...@gmail.com>> <mailto: >> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto:tedmis...@gmail.com >> <mailto:tedmis...@gmail.com>>> <mailto: >>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> >>>> <mailto:tedmis...@gmail.com <mailto:tedmis...@gmail.com>> <mailto: >> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto:tedmis...@gmail.com >> <mailto:tedmis...@gmail.com>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> While we're on the topic, I'd love any feedback from Bolke or >>>>>> others >>>>>>>>>>> who've >>>>>>>>>>>>> used Kerberos with Airflow on this quick guide I put together >>>>>>>>> yesterday. >>>>>>>>>>>>> It's similar to what's in the Airflow docs but instead all on >> one >>>>>> page >>>>>>>>>>>>> and slightly >>>>>>>>>>>>> expanded. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>> >>>> >> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >> >> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >> < >> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >> >> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>> >>>> < >>>> >> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >> >> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >> < >> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >> >> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>> >>>>> >>>>>> < >>>>>> >>>> >> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >> >> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >> < >> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >> >> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>> >>>> < >>>> >> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >> >> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >> < >> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >> >> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>> >>>>> >>>>>>> >>>>>>>>>>>>> (or web version <https://www.astronomer.io/guides/kerberos/ >>>>>>>>>>>>> <https://www.astronomer.io/guides/kerberos/> < >> https://www.astronomer.io/guides/kerberos/ >> <https://www.astronomer.io/guides/kerberos/>> < >>>> https://www.astronomer.io/guides/kerberos/ >>>> <https://www.astronomer.io/guides/kerberos/> < >> https://www.astronomer.io/guides/kerberos/ >> <https://www.astronomer.io/guides/kerberos/>>>>) >>>>>>>>>>>>> >>>>>>>>>>>>> One thing I'd like to add is a minimal example of how to >>>> Kerberize >>>>>> a >>>>>>>>>>> hook. >>>>>>>>>>>>> >>>>>>>>>>>>> I'd be happy to upstream this as well if it's useful (maybe a >>>>>>>>> Concepts > >>>>>>>>>>>>> Additional Functionality > Kerberos page?) >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Taylor >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *Taylor Edmiston* >>>>>>>>>>>>> Blog <https://blog.tedmiston.com/ <https://blog.tedmiston.com/> >>>>>>>>>>>>> <https://blog.tedmiston.com/ <https://blog.tedmiston.com/>> >> <https://blog.tedmiston.com/ <https://blog.tedmiston.com/> >> <https://blog.tedmiston.com/ <https://blog.tedmiston.com/>>>> >>>> | CV >>>>>>>>>>>>> <https://stackoverflow.com/cv/taylor >>>>>>>>>>>>> <https://stackoverflow.com/cv/taylor> < >> https://stackoverflow.com/cv/taylor <https://stackoverflow.com/cv/taylor>> < >>>> https://stackoverflow.com/cv/taylor <https://stackoverflow.com/cv/taylor> < >> https://stackoverflow.com/cv/taylor <https://stackoverflow.com/cv/taylor>>>> >> | LinkedIn >>>>>>>>>>>>> <https://www.linkedin.com/in/tedmiston/ >>>>>>>>>>>>> <https://www.linkedin.com/in/tedmiston/> < >> https://www.linkedin.com/in/tedmiston/ >> <https://www.linkedin.com/in/tedmiston/>> < >>>> https://www.linkedin.com/in/tedmiston/ >>>> <https://www.linkedin.com/in/tedmiston/> < >> https://www.linkedin.com/in/tedmiston/ >> <https://www.linkedin.com/in/tedmiston/>>>> | AngelList >>>>>>>>>>>>> <https://angel.co/taylor <https://angel.co/taylor> >>>>>>>>>>>>> <https://angel.co/taylor <https://angel.co/taylor>> < >> https://angel.co/taylor <https://angel.co/taylor> <https://angel.co/taylor >> <https://angel.co/taylor>>>> | Stack >>>> Overflow >>>>>>>>>>>>> <https://stackoverflow.com/users/149428/taylor-edmiston >>>>>>>>>>>>> <https://stackoverflow.com/users/149428/taylor-edmiston> < >> https://stackoverflow.com/users/149428/taylor-edmiston >> <https://stackoverflow.com/users/149428/taylor-edmiston>> < >>>> https://stackoverflow.com/users/149428/taylor-edmiston >>>> <https://stackoverflow.com/users/149428/taylor-edmiston> < >> https://stackoverflow.com/users/149428/taylor-edmiston >> <https://stackoverflow.com/users/149428/taylor-edmiston>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 26, 2018 at 5:18 PM, Driesprong, Fokko >>>>>>>>> <fo...@driesprong.frl <mailto:fo...@driesprong.frl> >>>>>>>>> <mailto:fo...@driesprong.frl <mailto:fo...@driesprong.frl>> <mailto: >> fo...@driesprong.frl <mailto:fo...@driesprong.frl> >> <mailto:fo...@driesprong.frl <mailto:fo...@driesprong.frl>>> >>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Ry, >>>>>>>>>>>>>> >>>>>>>>>>>>>> You should ask Bolke de Bruin. He's really experienced with >>>>>> Kerberos >>>>>>>>>>> and >>>>>>>>>>>>> he >>>>>>>>>>>>>> did also the implementation for Airflow. Beside that he worked >>>>>> also >>>>>>>>> on >>>>>>>>>>>>>> implementing Kerberos in Ambari. Just want to let you know. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, Fokko >>>>>>>>>>>>>> >>>>>>>>>>>>>> Op do 26 jul. 2018 om 23:03 schreef Ry Walker < >> r...@astronomer.io <mailto:r...@astronomer.io> <mailto:r...@astronomer.io >> <mailto:r...@astronomer.io>> >>>> <mailto:r...@astronomer.io <mailto:r...@astronomer.io> >>>> <mailto:r...@astronomer.io <mailto:r...@astronomer.io>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi everyone - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We have several bigCo's who are considering using Airflow >>>> asking >>>>>>>>> into >>>>>>>>>>>>> its >>>>>>>>>>>>>>> support for Kerberos. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We're going to work on a proof-of-concept next week, will >>>> likely >>>>>>>>>>>>> record a >>>>>>>>>>>>>>> screencast on it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For now, we're looking for any anecdotal information from >>>>>>>>>>> organizations >>>>>>>>>>>>>> who >>>>>>>>>>>>>>> are using Kerberos with Airflow, if anyone would be willing >> to >>>>>> share >>>>>>>>>>>>>> their >>>>>>>>>>>>>>> experiences here, or reply to me personally, it would be >>>> greatly >>>>>>>>>>>>>>> appreciated! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Ry >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Ry Walker* | CEO, Astronomer <http://www.astronomer.io/ >>>>>>>>>>>>>>> <http://www.astronomer.io/> < >> http://www.astronomer.io/ <http://www.astronomer.io/>> < >>>> http://www.astronomer.io/ <http://www.astronomer.io/> >>>> <http://www.astronomer.io/ <http://www.astronomer.io/>>>> | >>>>>>>>>>>>>> 513.417.2163 | >>>>>>>>>>>>>>> @rywalker <http://twitter.com/rywalker >>>>>>>>>>>>>>> <http://twitter.com/rywalker> < >> http://twitter.com/rywalker <http://twitter.com/rywalker>> < >>>> http://twitter.com/rywalker <http://twitter.com/rywalker> >>>> <http://twitter.com/rywalker <http://twitter.com/rywalker>>>> | LinkedIn >>>>>>>>>>>>>>> <http://www.linkedin.com/in/rywalker >>>>>>>>>>>>>>> <http://www.linkedin.com/in/rywalker> < >> http://www.linkedin.com/in/rywalker <http://www.linkedin.com/in/rywalker>> < >>>> http://www.linkedin.com/in/rywalker <http://www.linkedin.com/in/rywalker> < >> http://www.linkedin.com/in/rywalker <http://www.linkedin.com/in/rywalker>>>>