I have two points:
1. The SLA feature in Airflow is very weird and limited. It doesn't really give SLA as you would expect.
2. The dashboards in Airflow are very basic which force users to do self development of dashboards with external tool. You have no way to know what DAG is draining your resources.


16.06.2021, 11:09, "Valeriy Solovyov" <[email protected]>:
1. I find working with connections in a multi-region environment very hard:
We have a DAG that works with some cloud resources (for example S3).
In case the DAG hits any API limitation, error I cannot fall back to another connection because I don't know if the issue is with the cloud provider or data itself.

So we need to orchestrate the creation of the connections from outside based on errors while error handling is happening inside DAG itself.

2. I find implementing security in connections very hard:

Airflow generates a new AWS STS session (1 hour long) that we pass to the docker container where we assume the IAM role from the STS session. 
Then Apache Spark will use the assumed role to read-write to the S3 bucket ( approximately 2 hours) and fail in the middle because the session is expired.
In the end, if we use hashicorp vault to get a temporary password/tokenn in order to interact with external API - we cannot use it because it will expire (after 2 hours).



On Tue, Jun 15, 2021 at 9:34 PM Canapathy, Subash <[email protected]> wrote:

My 2 cents. Roughly based on conversations with Airflow users internal and external.

 

  • Platformization and multi-tenancy has been hard to solve on Airflow. Large deployments of Airflow (multi-cluster on-prem or cloud) have a need to platformize, manage/govern and vend Airflow as a service to their data engineering communities internally. The feature/means to achieving this will be through 1/ DAG manifests and making DAGs a top level tracked/guid-ed entity that has permissions and context models associated with it so that multiple tenants can operate out of the same Airflow environment.
  • Expanding DAG folders into workspaces/profiles. Users will benefit from a high level construct to group DAGs. This will unlock more opportunities on permission scoping and templatization of access across all dags in that profile/folder.  This might also have some UX benefits as a side effect – users dislike seeing a thousand DAGs on the list view.
  • Modernizing the DAGProcessing->Executor loop in order to support remote DAG fetcher, synced DAGs and similar features. This will lessen the reliance to on-disk files as source of truth. While DAG processing can become faster purely based on DAG Manifest + serialized version of DAG. The secondary process can asynchronously work to update the manifest and serialized DAG based on either filesystem (default) or any configured remote DAG fetcher.

 

From: Ash Berlin-Taylor <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Tuesday, June 15, 2021 at 2:55 AM
To: "[email protected]" <[email protected]>
Subject: [EXTERNAL] Roadmap ideas for Airflow 2.2 and beyond

 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

 

Hi everyone,

 

As I'm sure many of you are aware I (along with Aizhamal) am giving the opening keynote at this year's Airflow summit, and I'm covering "what's next after 2.0" -- essentially what is the roadmap for Airflow for the next 12-18 months.

 

Since Airflow is a community project first and foremost I'd like to get all your ideas, no matter how off the wall :)

 

I've got my own ideaas, and 2.2 is fairly firm already (AIPs 39 and 40), but 2.3 and beyond starts to get less clear, so if you have something that you'd like to see Airflow be able to do or do better, now is the time to speak up.

 

You don't have to have a solution, just "I find doing X hard/annoying/difficult" is enough.

 

(And a general reminder: the roadmap is a statement of intent, not a promise of timeline or even that a feature will actually be implemented)

 

To keep this thread manageable, please can we avoid discussions _in this thread_ about ideas and keep +1/me too's to a minimum.

 

Cheers,

-ash

Reply via email to