The example I give was a simplified version, and also a continuation of another DAG process.
The issue I tried to solve in Airflow here for this case (we have also other use cases where we ran into the same issue) was reading n number of users from MSGraph, which where updated and had to be synchronized in our datawarehouse. The problem is that for each user, we also then need to update the groups it belongs to, the devices and the licenses for each user, and so on. Unfortunately, those 3 things I just mentioned need a dedicated MSGraph calls per user, you can't get his information in one call nor even combined with the updated users call, you have to do it all individually. So in the above example you would get 3 additional calls per updated user, which means 3 extra MSGraph calls. If you have like 1k updated users, that would mean 3k dynamic tasks. My first approach was using dynamic tasks, but that exploded very quickly as explained above, as each updated user will trigger 3 calls, and users get updated frequently. For example an updated permission/role for a user will trigger an update, if you have 70k+ users, it can grow quickly. The original job is running in custom python code using RxPy (https://github.com/ReactiveX/RxPY) which is using the reactive programming methodology, but we want to step away from it because everything is custom regarding invoking msgraph as well as writing to the database in the code but also the CI/CD involved in maintaining this project. We want to move away from custom code and have native Airflow jobs, and I personally think this case is perfectly possible in Airflow, at least if we would have the "streaming" option, which I use now and works fine. -----Original Message----- From: Daniel Standish <daniel.stand...@astronomer.io.INVALID> Sent: Wednesday, September 18, 2024 6:41 PM To: dev@airflow.apache.org Subject: Re: [PROPOSAL] Add streaming support to PartialOperator EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur deze e-mail als bijlage naar ab...@infrabel.be<mailto:ab...@infrabel.be>. Curious why you want to model this as many tasks, e.g. one page == one task. Another option would be to handle many pages in one task. And I'm curious what were the factors that led you to split it out more granularly.