Re: Damian's point. Just to separate this to another thread:
> While my team produces reports for the entire company, we must segregate different business team's information, and to ensure this there is a policy that no server has access to information from both teams. We separate our workers and set certain Airflow tasks to one server or the other, for this case we cannot set sensors to deferrable, because the Airflow defer process does not support being able to choose which triggerer to send to. > We use the celery executor and set the queue name on the relevant tasks, that queue name corresponds to a celery queue, and then the airflow workers are assigned a specific celery queue to read from. Multi-team airflow (AIP-67) https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components - should serve very well in this case. Each team can have its own environment, including dependencies and its own triggerer. It is planned to be implemented for 3.1 for now, so I guess this means that when it is available, you should be able to migrate to a Deferrable-only approach for all your sensors. Do you agree Damian? J. On Wed, Nov 13, 2024 at 8:07 PM Damian Shaw <ds...@striketechnologies.com> wrote: > We use the celery executor and set the queue name on the relevant tasks, > that queue name corresponds to a celery queue, and then the airflow workers > are assigned a specific celery queue to read from. > > Damian > > -----Original Message----- > From: Vikram Koka <vik...@astronomer.io.INVALID> > Sent: Wednesday, November 13, 2024 12:59 PM > To: dev@airflow.apache.org > Subject: Re: [DISCUSSION] Replace Poke & Reschedule mode from Sensors for > Airflow 3 in favor of Deferrable > > Damian, > > Thank you for that response. That's an interesting use case and really > appreciate you sharing that perspective. > Just to make sure that I have clarity on this, how are you "separating > Airflow Workers and setting Airflow Tasks to one server or the other" > today? > > Vikram > > On Wed, Nov 13, 2024 at 8:13 AM Damian Shaw <ds...@striketechnologies.com> > wrote: > > > One use case that does not work with defer is when you're checking > > resources that are only available on a subset of Airflow workers. > > > > While my team produces reports for the entire company, we must > > segregate different business team's information, and to ensure this > > there is a policy that no server has access to information from both > > teams. We separate our workers and set certain Airflow tasks to one > > server or the other, for this case we cannot set sensors to > > deferrable, because the Airflow defer process does not support being > able to choose which triggerer to send to. > > > > I am sure that we aren't the only company with this problem and are > > using Airflow workers in this way. > > > > Damian > > > > -----Original Message----- > > From: ambika garg <ambikagarg1...@gmail.com> > > Sent: Wednesday, November 13, 2024 11:01 AM > > To: dev@airflow.apache.org > > Subject: Re: [DISCUSSION] Replace Poke & Reschedule mode from Sensors > > for Airflow 3 in favor of Deferrable > > > > I would vote for option 3 as well, making deferrable operators the > > default mode ensures that users benefit from the most efficient, > > async-driven solution without requiring any additional configuration > > changes. Also, keeping Poke and Reschedule modes ensures backward > > compatibility with existing operators and users who rely on these modes. > > > > On Wed, Nov 13, 2024 at 9:58 AM Vincent Beck <vincb...@apache.org> > wrote: > > > > > I am definitely in favor of considering the deferrable mode as the > > > default one. Between 1 and 3, even though I am a big fan of removing > > > and simplifying things in general, I feel like (no real data here) > > > we are not ready for 1 yet. So my vote would go to 3. I feel like > > > removing the poke mode would require too much work on the operators. > > > > > > On 2024/11/13 13:59:35 Jarek Potiuk wrote: > > > > I am torn between 1) and 3). While 1) is tempting and it would > > > > simplify > > > our > > > > state management, I think 3) is the safer choice. I think if we > > > > were not able to convert all our operators to be deferrable yet, > > > > there are > > > probably > > > > many thousands of custom ones that will stop working if we remove > > > > that feature. > > > > > > > > If we go 1) but then any operator that would go to "poke & > > > > reschedule", should just be "normal sensor" and simply start > > > > taking resources while waiting. Technically it's not "breaking" > > > > the flow, but it's likely > > > breaking > > > > installation which heavily relies on rescheduling and it would > > > dramatically > > > > increase resource usage. And there is no easy way out short of > > > > rewriting all such operators to support deferrable. > > > > > > > > I think personally that in making such decision we should consider > > > > two > > > > things: > > > > > > > > 1) will this stop some people from migrating to Airflow 3 because > > > > it will be "heavy operation"? > > > > 2) how likely we think it's going to happen - will we use big, > > > > important users who might be "success" stories for Airflow 3. > > > > > > > > I have no data to back it up, (maybe some people here could have > > > > it) > > > > - > > > but > > > > my intuition tells me that: > > > > > > > > 1) yes it will stop some users from migrating to Airflow 3 > > > > (because they will either have to accept increased resource usage > > > > or find engineering time to rewrite their custom operators) > > > > 2) yeah, I think it's quite likely and quite likely big users that > > > > could > > > be > > > > "Airflow 3 success story" might be affected > > > > > > > > But I am guessing. If someone could provide some data telling that > > > > either > > > > 1) or 2) assumption I made is false, I am happy to support option > > > > 1). For now it's 3). > > > > > > > > > > > > > > > > On Wed, Nov 13, 2024 at 2:43 PM Abhishek Bhakat > > > > <abhishek.bha...@astronomer.io.invalid> wrote: > > > > > > > > > +1 to 3. For cases where absolute minimal latency is critical, > > > > > +and > > > worker > > > > > resources aren't constrained, poke mode could still be the > > > > > optimal > > > choice. > > > > > I don't see any value in reschedule mode anymore, deferrable > > > > > should be > > > the > > > > > default. > > > > > > > > > > On Wed, Nov 13, 2024 at 12:21 PM Kaxil Naik > > > > > <kaxiln...@gmail.com> > > > wrote: > > > > > > > > > > > There is 4th option to keep things as-is too :) > > > > > > > > > > > > On Wed, 13 Nov 2024 at 12:19, Kaxil Naik <kaxiln...@gmail.com> > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > Following up on the Dev call discussions last Thursday, I am > > > opening > > > > > this > > > > > > > up for discussion. > > > > > > > > > > > > > > Reschedule mode was introduced to improve efficiency over > > > > > > > poke > > > mode by > > > > > > > allowing tasks to wait without holding a worker slot. Since > > > > > > > the introduction of deferrable operators in Airflow 2.2, > > > > > > > however, we > > > now > > > > > have > > > > > > > an even more optimal, async-driven solution. The adoption of > > > deferrable > > > > > > > operators has been really good, and since we are already > > > > > > > chopping > > > > > things > > > > > > > off with Airflow 3 it might be time to consider making them > > > > > > > the > > > default > > > > > > > mode. > > > > > > > > > > > > > > This will ensure that our users always have the most optimal > > > > > > > way of running sensors by default and that we, the > > > > > > > maintainers or folks > > > > > > supporting > > > > > > > Airflow deployments in companies, do not need to know > > > > > > > different > > > > > > approaches > > > > > > > with Reschedule mode, either. > > > > > > > > > > > > > > However, not all sensors can be async, either due to > > > > > > > limitations in underlying libraries or a lack of unique ids > > > > > > > for > > async polling. > > > > > > > > > > > > > > Knowing that we have a few options: > > > > > > > > > > > > > > 1) *Remove Poke & Reschedule modes* > > > > > > > > > > > > > > This is aggressive and it means we will have to remove > > > > > > > things like PostgresSensor that does not support async. > > > > > > > > > > > > > > 2) *Remove Reschedule mode * > > > > > > > > > > > > > > Make deferrable the primary mode, falling back to poke where > > > > > > > async > > > > > isn’t > > > > > > > supported. > > > > > > > > > > > > > > 3) *Make Deferrable the default, keep Poke & Reschedule* > > > > > > > > > > > > > > This is a defensive option that maintains current behaviour > > > > > > > but > > > ensures > > > > > > > that we have the most performant option by default. It could > > > > > > > be as > > > > > simple > > > > > > > as making AIRFLOW__OPERATORS__DEFAULT_DEFERRABLE default to > > True. > > > > > > > > > > > > > > I’d love to hear feedback, especially from users who rely on > > > reschedule > > > > > > > mode today! > > > > > > > > > > > > > > Regards, > > > > > > > Kaxil > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------- > > > - To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > ________________________________ > > Strike Technologies, LLC (“Strike”) is part of the GTS family of > > companies. Strike is a technology solutions provider, and is not a > > broker or dealer and does not transact any securities related business > > directly whatsoever. This communication is the property of Strike and > > its affiliates, and does not constitute an offer to sell or the > > solicitation of an offer to buy any security in any jurisdiction. It > > is intended only for the person to whom it is addressed and may > > contain information that is privileged, confidential, or otherwise > protected from disclosure. > > Distribution or copying of this communication, or the information > > contained herein, by anyone other than the intended recipient is > > prohibited. If you have received this communication in error, please > > immediately notify Strike at i...@striketechnologies.com, and delete > and destroy any copies hereof. > > ________________________________ > > > > CONFIDENTIALITY / PRIVILEGE NOTICE: This transmission and any > > attachments are intended solely for the addressee. This transmission > > is covered by the Electronic Communications Privacy Act, 18 U.S.C > > ''2510-2521. The information contained in this transmission is > > confidential in nature and protected from further use or disclosure > > under U.S. Pub. L. 106-102, 113 U.S. Stat. 1338 (1999), and may be > > subject to attorney-client or other legal privilege. Your use or > > disclosure of this information for any purpose other than that > > intended by its transmittal is strictly prohibited, and may subject > > you to fines and/or penalties under federal and state law. If you are > > not the intended recipient of this transmission, please DESTROY ALL > > COPIES RECEIVED and confirm destruction to the sender via return > transmittal. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > ________________________________ > Strike Technologies, LLC (“Strike”) is part of the GTS family of > companies. Strike is a technology solutions provider, and is not a broker > or dealer and does not transact any securities related business directly > whatsoever. This communication is the property of Strike and its > affiliates, and does not constitute an offer to sell or the solicitation of > an offer to buy any security in any jurisdiction. It is intended only for > the person to whom it is addressed and may contain information that is > privileged, confidential, or otherwise protected from disclosure. > Distribution or copying of this communication, or the information contained > herein, by anyone other than the intended recipient is prohibited. If you > have received this communication in error, please immediately notify Strike > at i...@striketechnologies.com, and delete and destroy any copies hereof. > ________________________________ > > CONFIDENTIALITY / PRIVILEGE NOTICE: This transmission and any attachments > are intended solely for the addressee. This transmission is covered by the > Electronic Communications Privacy Act, 18 U.S.C ''2510-2521. The > information contained in this transmission is confidential in nature and > protected from further use or disclosure under U.S. Pub. L. 106-102, 113 > U.S. Stat. 1338 (1999), and may be subject to attorney-client or other > legal privilege. Your use or disclosure of this information for any purpose > other than that intended by its transmittal is strictly prohibited, and may > subject you to fines and/or penalties under federal and state law. If you > are not the intended recipient of this transmission, please DESTROY ALL > COPIES RECEIVED and confirm destruction to the sender via return > transmittal. >