Yeah On Wed, 13 Nov 2024 at 22:26, Jarek Potiuk <ja...@potiuk.com> wrote:
> Re: Damian's point. > > Just to separate this to another thread: > > > While my team produces reports for the entire company, we must segregate > different business team's information, and to ensure this there is a policy > that no server has access to information from both teams. We separate our > workers and set certain Airflow tasks to one server or the other, for this > case we cannot set sensors to deferrable, because the Airflow defer process > does not support being able to choose which triggerer to send to. > > > We use the celery executor and set the queue name on the relevant tasks, > that queue name corresponds to a celery queue, and then the airflow workers > are assigned a specific celery queue to read from. > > Multi-team airflow (AIP-67) > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components > - should serve very well in this case. Each team can have its own > environment, including dependencies and its own triggerer. It is planned to > be implemented for 3.1 for now, so I guess this means that when it is > available, you should be able to migrate to a Deferrable-only approach for > all your sensors. > > Do you agree Damian? > > J. > > > > > On Wed, Nov 13, 2024 at 8:07 PM Damian Shaw <ds...@striketechnologies.com> > wrote: > > > We use the celery executor and set the queue name on the relevant tasks, > > that queue name corresponds to a celery queue, and then the airflow > workers > > are assigned a specific celery queue to read from. > > > > Damian > > > > -----Original Message----- > > From: Vikram Koka <vik...@astronomer.io.INVALID> > > Sent: Wednesday, November 13, 2024 12:59 PM > > To: dev@airflow.apache.org > > Subject: Re: [DISCUSSION] Replace Poke & Reschedule mode from Sensors for > > Airflow 3 in favor of Deferrable > > > > Damian, > > > > Thank you for that response. That's an interesting use case and really > > appreciate you sharing that perspective. > > Just to make sure that I have clarity on this, how are you "separating > > Airflow Workers and setting Airflow Tasks to one server or the other" > > today? > > > > Vikram > > > > On Wed, Nov 13, 2024 at 8:13 AM Damian Shaw < > ds...@striketechnologies.com> > > wrote: > > > > > One use case that does not work with defer is when you're checking > > > resources that are only available on a subset of Airflow workers. > > > > > > While my team produces reports for the entire company, we must > > > segregate different business team's information, and to ensure this > > > there is a policy that no server has access to information from both > > > teams. We separate our workers and set certain Airflow tasks to one > > > server or the other, for this case we cannot set sensors to > > > deferrable, because the Airflow defer process does not support being > > able to choose which triggerer to send to. > > > > > > I am sure that we aren't the only company with this problem and are > > > using Airflow workers in this way. > > > > > > Damian > > > > > > -----Original Message----- > > > From: ambika garg <ambikagarg1...@gmail.com> > > > Sent: Wednesday, November 13, 2024 11:01 AM > > > To: dev@airflow.apache.org > > > Subject: Re: [DISCUSSION] Replace Poke & Reschedule mode from Sensors > > > for Airflow 3 in favor of Deferrable > > > > > > I would vote for option 3 as well, making deferrable operators the > > > default mode ensures that users benefit from the most efficient, > > > async-driven solution without requiring any additional configuration > > > changes. Also, keeping Poke and Reschedule modes ensures backward > > > compatibility with existing operators and users who rely on these > modes. > > > > > > On Wed, Nov 13, 2024 at 9:58 AM Vincent Beck <vincb...@apache.org> > > wrote: > > > > > > > I am definitely in favor of considering the deferrable mode as the > > > > default one. Between 1 and 3, even though I am a big fan of removing > > > > and simplifying things in general, I feel like (no real data here) > > > > we are not ready for 1 yet. So my vote would go to 3. I feel like > > > > removing the poke mode would require too much work on the operators. > > > > > > > > On 2024/11/13 13:59:35 Jarek Potiuk wrote: > > > > > I am torn between 1) and 3). While 1) is tempting and it would > > > > > simplify > > > > our > > > > > state management, I think 3) is the safer choice. I think if we > > > > > were not able to convert all our operators to be deferrable yet, > > > > > there are > > > > probably > > > > > many thousands of custom ones that will stop working if we remove > > > > > that feature. > > > > > > > > > > If we go 1) but then any operator that would go to "poke & > > > > > reschedule", should just be "normal sensor" and simply start > > > > > taking resources while waiting. Technically it's not "breaking" > > > > > the flow, but it's likely > > > > breaking > > > > > installation which heavily relies on rescheduling and it would > > > > dramatically > > > > > increase resource usage. And there is no easy way out short of > > > > > rewriting all such operators to support deferrable. > > > > > > > > > > I think personally that in making such decision we should consider > > > > > two > > > > > things: > > > > > > > > > > 1) will this stop some people from migrating to Airflow 3 because > > > > > it will be "heavy operation"? > > > > > 2) how likely we think it's going to happen - will we use big, > > > > > important users who might be "success" stories for Airflow 3. > > > > > > > > > > I have no data to back it up, (maybe some people here could have > > > > > it) > > > > > - > > > > but > > > > > my intuition tells me that: > > > > > > > > > > 1) yes it will stop some users from migrating to Airflow 3 > > > > > (because they will either have to accept increased resource usage > > > > > or find engineering time to rewrite their custom operators) > > > > > 2) yeah, I think it's quite likely and quite likely big users that > > > > > could > > > > be > > > > > "Airflow 3 success story" might be affected > > > > > > > > > > But I am guessing. If someone could provide some data telling that > > > > > either > > > > > 1) or 2) assumption I made is false, I am happy to support option > > > > > 1). For now it's 3). > > > > > > > > > > > > > > > > > > > > On Wed, Nov 13, 2024 at 2:43 PM Abhishek Bhakat > > > > > <abhishek.bha...@astronomer.io.invalid> wrote: > > > > > > > > > > > +1 to 3. For cases where absolute minimal latency is critical, > > > > > > +and > > > > worker > > > > > > resources aren't constrained, poke mode could still be the > > > > > > optimal > > > > choice. > > > > > > I don't see any value in reschedule mode anymore, deferrable > > > > > > should be > > > > the > > > > > > default. > > > > > > > > > > > > On Wed, Nov 13, 2024 at 12:21 PM Kaxil Naik > > > > > > <kaxiln...@gmail.com> > > > > wrote: > > > > > > > > > > > > > There is 4th option to keep things as-is too :) > > > > > > > > > > > > > > On Wed, 13 Nov 2024 at 12:19, Kaxil Naik <kaxiln...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > Following up on the Dev call discussions last Thursday, I am > > > > opening > > > > > > this > > > > > > > > up for discussion. > > > > > > > > > > > > > > > > Reschedule mode was introduced to improve efficiency over > > > > > > > > poke > > > > mode by > > > > > > > > allowing tasks to wait without holding a worker slot. Since > > > > > > > > the introduction of deferrable operators in Airflow 2.2, > > > > > > > > however, we > > > > now > > > > > > have > > > > > > > > an even more optimal, async-driven solution. The adoption of > > > > deferrable > > > > > > > > operators has been really good, and since we are already > > > > > > > > chopping > > > > > > things > > > > > > > > off with Airflow 3 it might be time to consider making them > > > > > > > > the > > > > default > > > > > > > > mode. > > > > > > > > > > > > > > > > This will ensure that our users always have the most optimal > > > > > > > > way of running sensors by default and that we, the > > > > > > > > maintainers or folks > > > > > > > supporting > > > > > > > > Airflow deployments in companies, do not need to know > > > > > > > > different > > > > > > > approaches > > > > > > > > with Reschedule mode, either. > > > > > > > > > > > > > > > > However, not all sensors can be async, either due to > > > > > > > > limitations in underlying libraries or a lack of unique ids > > > > > > > > for > > > async polling. > > > > > > > > > > > > > > > > Knowing that we have a few options: > > > > > > > > > > > > > > > > 1) *Remove Poke & Reschedule modes* > > > > > > > > > > > > > > > > This is aggressive and it means we will have to remove > > > > > > > > things like PostgresSensor that does not support async. > > > > > > > > > > > > > > > > 2) *Remove Reschedule mode * > > > > > > > > > > > > > > > > Make deferrable the primary mode, falling back to poke where > > > > > > > > async > > > > > > isn’t > > > > > > > > supported. > > > > > > > > > > > > > > > > 3) *Make Deferrable the default, keep Poke & Reschedule* > > > > > > > > > > > > > > > > This is a defensive option that maintains current behaviour > > > > > > > > but > > > > ensures > > > > > > > > that we have the most performant option by default. It could > > > > > > > > be as > > > > > > simple > > > > > > > > as making AIRFLOW__OPERATORS__DEFAULT_DEFERRABLE default to > > > True. > > > > > > > > > > > > > > > > I’d love to hear feedback, especially from users who rely on > > > > reschedule > > > > > > > > mode today! > > > > > > > > > > > > > > > > Regards, > > > > > > > > Kaxil > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------- > > > > - To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > > > > > > > ________________________________ > > > Strike Technologies, LLC (“Strike”) is part of the GTS family of > > > companies. Strike is a technology solutions provider, and is not a > > > broker or dealer and does not transact any securities related business > > > directly whatsoever. This communication is the property of Strike and > > > its affiliates, and does not constitute an offer to sell or the > > > solicitation of an offer to buy any security in any jurisdiction. It > > > is intended only for the person to whom it is addressed and may > > > contain information that is privileged, confidential, or otherwise > > protected from disclosure. > > > Distribution or copying of this communication, or the information > > > contained herein, by anyone other than the intended recipient is > > > prohibited. If you have received this communication in error, please > > > immediately notify Strike at i...@striketechnologies.com, and delete > > and destroy any copies hereof. > > > ________________________________ > > > > > > CONFIDENTIALITY / PRIVILEGE NOTICE: This transmission and any > > > attachments are intended solely for the addressee. This transmission > > > is covered by the Electronic Communications Privacy Act, 18 U.S.C > > > ''2510-2521. The information contained in this transmission is > > > confidential in nature and protected from further use or disclosure > > > under U.S. Pub. L. 106-102, 113 U.S. Stat. 1338 (1999), and may be > > > subject to attorney-client or other legal privilege. Your use or > > > disclosure of this information for any purpose other than that > > > intended by its transmittal is strictly prohibited, and may subject > > > you to fines and/or penalties under federal and state law. If you are > > > not the intended recipient of this transmission, please DESTROY ALL > > > COPIES RECEIVED and confirm destruction to the sender via return > > transmittal. > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > ________________________________ > > Strike Technologies, LLC (“Strike”) is part of the GTS family of > > companies. Strike is a technology solutions provider, and is not a broker > > or dealer and does not transact any securities related business directly > > whatsoever. This communication is the property of Strike and its > > affiliates, and does not constitute an offer to sell or the solicitation > of > > an offer to buy any security in any jurisdiction. It is intended only for > > the person to whom it is addressed and may contain information that is > > privileged, confidential, or otherwise protected from disclosure. > > Distribution or copying of this communication, or the information > contained > > herein, by anyone other than the intended recipient is prohibited. If you > > have received this communication in error, please immediately notify > Strike > > at i...@striketechnologies.com, and delete and destroy any copies > hereof. > > ________________________________ > > > > CONFIDENTIALITY / PRIVILEGE NOTICE: This transmission and any attachments > > are intended solely for the addressee. This transmission is covered by > the > > Electronic Communications Privacy Act, 18 U.S.C ''2510-2521. The > > information contained in this transmission is confidential in nature and > > protected from further use or disclosure under U.S. Pub. L. 106-102, 113 > > U.S. Stat. 1338 (1999), and may be subject to attorney-client or other > > legal privilege. Your use or disclosure of this information for any > purpose > > other than that intended by its transmittal is strictly prohibited, and > may > > subject you to fines and/or penalties under federal and state law. If you > > are not the intended recipient of this transmission, please DESTROY ALL > > COPIES RECEIVED and confirm destruction to the sender via return > > transmittal. > > >