Hi all , Here are the meeting notes. Please add/modify as needed.
I also have recording of the meeting. It's in two files due to some Webex issue that required a restart. Video 1 <https://airbnb.box.com/s/8b2c5691ux9qmkujum813xyscyol6rod> , Video 2 <https://airbnb.box.com/s/m1uxmkzcedd79f9fwd0s50hcda6hirmc> Cheers, Gurer *Attendees: * - Gurer Kiratli - Paul Yang - Maxime Beauchemin - Ben Tallman - George Leslie-Waksman - Vijay Bhat - Rob Froetscher - Xuanji Li - Bolke de Bruin - Catherine Wong - Sumit Maheshwari - Joe Schmid - Julia Hsieh - Sam *Agenda:* - Airbnb Update - Get an apache 1.7.3 release out - Eliminating cold-case PRs - So far, only Jeremiah and I have added to the list below. We need help from all committers: https://cwiki.apache.org/confluence/display/AIRFLOW/Whittling+down+PR+List - Roadmap *Meeting Notes * - Airflow - 60 companies, 200 contributors, a lot of interest for meetups. - Growing faster than other projects like Luigi, Oozie, Azkaban. - Airbnb Update - Cgroups for containment. Bad tasks don’t take down workers. - Impersonation. All jobs run under service account called airflow_user. This creates accounting and ownership problems. - Creating a new cluster in a different data center in the world. - ING update - Moving towards cross data center availability. Airflow might play a critical role in here. ReAir might be useful. - Cgroups is interesting. - Integration with legacy technologies with APIs is key. - Apache Release - Next week we are going to cut a release candidate. - Bolke, Max and some folks were working 1.7.1.3 to 1.7.2 having just cherry picking PRs that is related to Apache compliance. This is most to test out to process. 1.7.2 will be essentially just 1.7.1.3 not really a need to install this. - ETA is end of next week. - We will have the RC for 1.8.0. We will inform thru the Apache mail list. We should have other folks test as well. - We need to release every month. - Each organization has to have a pre-prod/staging environment. - We can have a Apache staging environment. - There are plugins and niche operators for every company. How are we going to handle? - Maybe we can modularize scheduler, UI etc. - If people want certain functionality they can be introduced as plugins or module. - Need to decouple components. - We need to design this plugin architecture easier said than done. There is already JIRAs for this JIRA1 <https://issues.apache.org/jira/browse/AIRFLOW-299> , JIRA2 <https://issues.apache.org/jira/browse/AIRFLOW-226> for this. - Airflow 2.0? - We might want to have a major fork out. - This might mean breaking backwards compatibility, repackaging. - This can prevent moving features for 1 and slate them for 2.0. - We can break the operators into sub-packages - Stateless Webservers could be in this. - DSL for defining pipelines should be backwards compatible. Old DAGs should work. - We can use Git hashes to see get the versions of the versions of DAGs. - We can have a field in the DAG that specify which Airflow version this DAG was designed for. - Cold PRs - We want more committers. More interaction gets you closer to being a committer. - It will be easier to review more PRs. - Lots of PRs need rebasing and would have conflicts. - IF you send a PR that is touching the core hence “dangerous” this would require much more scrutiny. Get buy in from the committers beforehand. Your work might not be committed at all. Having a design doc is a good idea. Or your PR will be treated as a design doc. : ) If we trust you, if you have already done committed PRs before there is more confidence so more chance to be reviewed. All the PRs has to be linted and testing needs to converged. - Testing is unstable. Travis is flapping. Some tests have some randomness. This has to be fixed. Cos your PR might be good but some test will fail, this is misleading. - We have a very limited control on the GitHub repo. Apache owns it. They will not give us the admin. We can’t introduce 3rd party services as we are not admins of the repo. - [Andrew Phillips] In Jcloud <https://github.com/jclouds/jclouds/> we have a GitHub-hosted mirror. We can do whatever we want in this repo. We do CI thru this. We can consider this in Airflow too. "Read-only mirror of ASF Git Repo for jclouds http://jclouds.apache.org/" - Maybe make a GitHub organization like Airflow-Airflow or something. - Can we have a policy on open PRs that has been open for n weeks? It will be closed. Maybe we can have a policy if it hits a certain age, we ask for a rebase and if we don’t hear for n days we should then close. - We might be able to automate this. Rails does it in a good way. But at this stage we might do it manually. - Let’s have a policy and put it out in the wiki. Also clarify ownership of components. - For different requests how do we handle communication? - Kill JIRA. Use GitHub Issues. Apache allows it. GitHub issues is easier to search. - Gitter is not super helpful. It’s more for ad hoc communication like Slack. - Dev mailing list would be for generic questions, requests. We have to have this. Apache enforces this. All the decision has to be given thru the Apache mailing list for legal reasons. Apache doesn’t want secret decision. We can have a mirror Google groups that is subscribed to this. We can possibly multiple Apache mailing list. - Mail is slower but mitigates being spread around different timezones issue. - We can break out tickets to Newbie tickets and Projects. Projects will be sponsored by - Differentiate the level of interest and level of desire to commitment. Q4, 2017 Vision / Roadmap We will create a wiki page with these and see interest. Possible Deliverables - Integration Testing Environment(s) - Modularizing Airflow - Containers (Docker, Kubernetes, ECS) - How do we package management for the DAG and its dependencies, the environment? - Running on your laptop is an important thing to keep. This helped the project a lot. This shouldn’t be let go. - Multitenancy - Security Improvements - UI and CLI level roles. - Kerberos support. - Putting the password information in a Vault. Currently if you have the key you have access to the whole vault. - Managing connection pools better. - Defining the difference between a hook and connection. - Stateless Webservers - This can help UI improvements. - UI improvements - Performance.. like better assets caching, compression and use of cdn.. - Reactify. - Support for large DAGs. - This might be dependent on the API. - Managing logs - Tighter Dev, test, deploy story - The local environment and production environment need to be in line. - DAG.validate method. - Rest API - Outside interface where other applications integrates. Like to trigger a task. - Internal service APIs to get execution details. To abstract the Database. - It’s good to see who is interested in these features(care about them) and driving these features. - Hardening the scheduler. Stabilizing. - Documenting how the scheduler operates. - Running a single task continuously. - More visibility into Why do we need to restart scheduler? - Clarifying the contract between the scheduler and the workers(???) - Stuck scheduler issues. - Backfill has a separate code path. It should be a flavor. - Better documentation - Onboarding documentation - Runbook - How the system works? - Event driven scheduler - First task snowballing and kicking other tasks. This reduces the white space between when task is runnable and the task is actually is run. - Revamp the SubDAG operator - So many special cases in the code base. - It should be handled by the scheduler. - Having a task name, separate from task id. Same for the DAG name and DAG id. - Remove pickling.
