Yes, no invite required. See you tomorrow!
On 21 Apr 2021, 07:46 +0100, Sumit Maheshwari <msu...@apache.org>, wrote:
> I'll join as well (I believe the zoom link will work without an invite)
>
> > On Wed, Apr 21, 2021 at 10:48 AM Dimitris Stafylarakis <xan...@gmail.com> 
> > wrote:
> > > hi all,
> > >
> > > great to read about this, I'd like to join in! Can I just join using the 
> > > zoom link tomorrow or do I need an invitation? (If I do need one, please 
> > > invite me :))
> > >
> > > cheers
> > >
> > >
> > > > On Wed, Apr 14, 2021 at 8:15 PM Daniel Imberman 
> > > > <daniel.imber...@gmail.com> wrote:
> > > > > Thank you Ian,
> > > > >
> > > > > I’ve invited everyone on this thread to the meeting with that zoom 
> > > > > link. Anyone else who wants to join can add the calendar event here 
> > > > > calendar.google.com/event?action=TEMPLATE&tmeid=Mm4zN2Q3MnFwNnBqbW9hMmNocXMyNzJpdHYgZGFuaWVsQGFzdHJvbm9tZXIuaW8&tmsrc=dan...@astronomer.io
> > > > >
> > > > > On Wed, Apr 14, 2021 at 11:05 AM, Ian Buss <ianjb...@gmail.com> wrote:
> > > > > > If this works for everyone, here's a zoom link for Thursday 8AM 
> > > > > > PST: 
> > > > > > https://cloudera.zoom.us/j/99928254235?pwd=VTFlQk4vQjQ5Z2JzUDM3ZWZKKy9MQT09
> > > > > >
> > > > > > Happy to move or use an alternate method as needed.
> > > > > >
> > > > > > > On Wed, Apr 14, 2021 at 6:58 PM Daniel Imberman 
> > > > > > > <daniel.imber...@gmail.com> wrote:
> > > > > > > > Thursday works for me!
> > > > > > > >
> > > > > > > > On Wed, Apr 14, 2021 at 10:05 AM, Ian Buss <ianjb...@gmail.com> 
> > > > > > > > wrote:
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I actually can’t do Wednesday next week as I’m moving house 
> > > > > > > > > :) Any chance we could do Thursday or Friday at the same time?
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > > Ian
> > > > > > > > > On 14 Apr 2021, 17:49 +0100, Kaxil Naik 
> > > > > > > > > <kaxiln...@gmail.com>, wrote:
> > > > > > > > > > Just few comments here:
> > > > > > > > > >
> > > > > > > > > > Currently -- atleast for the foreseeable future Airflow 
> > > > > > > > > > workers will need access to the DAG Files, so workers can 
> > > > > > > > > > not run using the Serialized DAGs.
> > > > > > > > > >
> > > > > > > > > > Also serialized DAGs do not even have all the info needed 
> > > > > > > > > > for it to run it. Currently the serialization happens in 
> > > > > > > > > > the parsing process in the scheduler which can be in future 
> > > > > > > > > > separated as a separator "parsining" component, but that 
> > > > > > > > > > won't solve the "isolation" problem you are trying to 
> > > > > > > > > > solve. The only current way it can be solved is pickling -- 
> > > > > > > > > > and we have strictly decided against using pickling for 
> > > > > > > > > > DAGs.
> > > > > > > > > >
> > > > > > > > > > The idea in Statement (2) & (3) would help solve the 
> > > > > > > > > > isolation problem in (1) and can be done with some work now.
> > > > > > > > > >
> > > > > > > > > > Happy to talk about it in more detail here or on call, the 
> > > > > > > > > > time Daniel suggested works for me.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Kaxil
> > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 14, 2021 at 5:35 PM Daniel Imberman 
> > > > > > > > > > > <daniel.imber...@gmail.com> wrote:
> > > > > > > > > > > > How about Wednesday, April 21 at 8:00AM PST?
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Apr 14, 2021 at 9:33 AM, Xinbin Huang 
> > > > > > > > > > > > <bin.huan...@gmail.com> wrote:
> > > > > > > > > > > > > I am available any days.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Apr 14, 2021, 9:32 AM Daniel Imberman 
> > > > > > > > > > > > > > <daniel.imber...@gmail.com> wrote:
> > > > > > > > > > > > > > > Hi everyone!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Would people be available around 8AM/9AM PST some 
> > > > > > > > > > > > > > > point next week? I’m in PST and Ian is UTC+1 so 
> > > > > > > > > > > > > > > would be great to find a timezone that 
> > > > > > > > > > > > > > > accomodates everyone.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Daniel
> > > > > > > > > > > > > > > On Wed, Apr 14, 2021 at 6:26 AM, Ryan Hatter 
> > > > > > > > > > > > > > > <ryannhat...@gmail.com> wrote:
> > > > > > > > > > > > > > > > I’d also like to be added please :)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Apr 13, 2021, at 21:27, Xinbin Huang 
> > > > > > > > > > > > > > > > > <bin.huan...@gmail.com> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Daniel & Ian,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I am also interested in the idea of a 
> > > > > > > > > > > > > > > > > serialization representation that can be 
> > > > > > > > > > > > > > > > > executed by workers directly. Can you also 
> > > > > > > > > > > > > > > > > add me to the call?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > > > Bin
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Apr 13, 2021 at 2:49 PM Ian Buss 
> > > > > > > > > > > > > > > > > > <ianjb...@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > Daniel,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for your warm welcome and quick 
> > > > > > > > > > > > > > > > > > > response and the advice on providers! 
> > > > > > > > > > > > > > > > > > > Will certainly check out the examples you 
> > > > > > > > > > > > > > > > > > > sent.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > 1. An "airflow register" command 
> > > > > > > > > > > > > > > > > > > definitely sounds promising, would love 
> > > > > > > > > > > > > > > > > > > to collaborate on an AIP there so let's 
> > > > > > > > > > > > > > > > > > > set something up.
> > > > > > > > > > > > > > > > > > > 2. We use KubernetesExecutor exclusively 
> > > > > > > > > > > > > > > > > > > as well. We've noticed significant 
> > > > > > > > > > > > > > > > > > > additional load on the metadata DB as we 
> > > > > > > > > > > > > > > > > > > scale up task pods so I've also thought 
> > > > > > > > > > > > > > > > > > > about an API-based approach. Such an API 
> > > > > > > > > > > > > > > > > > > could also open up the possibility of 
> > > > > > > > > > > > > > > > > > > per-task security tokens which are 
> > > > > > > > > > > > > > > > > > > injected by the scheduler, which should 
> > > > > > > > > > > > > > > > > > > improve the security of such a system. 
> > > > > > > > > > > > > > > > > > > Food for thought at least. I will start 
> > > > > > > > > > > > > > > > > > > putting some of these thoughts down on 
> > > > > > > > > > > > > > > > > > > paper in a sharable format.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Ian
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Apr 13, 2021 at 7:46 PM Daniel 
> > > > > > > > > > > > > > > > > > > > Imberman <daniel.imber...@gmail.com> 
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > Hi Ian,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Firstly, welcome to the Airflow 
> > > > > > > > > > > > > > > > > > > > > community :). I'm glad to hear you've 
> > > > > > > > > > > > > > > > > > > > > had a positive experience so far. 
> > > > > > > > > > > > > > > > > > > > > It's great to hear that you want to 
> > > > > > > > > > > > > > > > > > > > > contribute back, and I think that 
> > > > > > > > > > > > > > > > > > > > > multi-tenancy/DAG isolation is a 
> > > > > > > > > > > > > > > > > > > > > pretty fantastic project for the 
> > > > > > > > > > > > > > > > > > > > > community as a whole (a lot of things 
> > > > > > > > > > > > > > > > > > > > > are are things we want but are 
> > > > > > > > > > > > > > > > > > > > > limited by hours in a day).
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 1. I've personally been kicking 
> > > > > > > > > > > > > > > > > > > > > around some ideas lately about an 
> > > > > > > > > > > > > > > > > > > > > "airflow register" command that would 
> > > > > > > > > > > > > > > > > > > > > write the DAG into the metadata DB in 
> > > > > > > > > > > > > > > > > > > > > a way that could be "gettable" by the 
> > > > > > > > > > > > > > > > > > > > > workers via the API. This work is 
> > > > > > > > > > > > > > > > > > > > > very early. I'd love to get some help 
> > > > > > > > > > > > > > > > > > > > > on it. Perhaps we can set up a zoom 
> > > > > > > > > > > > > > > > > > > > > chat to discuss drafting an AIP?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 2. Limiting worker access to the DB 
> > > > > > > > > > > > > > > > > > > > > is not only good security practice; 
> > > > > > > > > > > > > > > > > > > > > it also opens up the door to a lot of 
> > > > > > > > > > > > > > > > > > > > > valuable features. This feature would 
> > > > > > > > > > > > > > > > > > > > > be especially close to my heart as it 
> > > > > > > > > > > > > > > > > > > > > would make the KubernetesExecutor 
> > > > > > > > > > > > > > > > > > > > > significantly more efficient. It 
> > > > > > > > > > > > > > > > > > > > > should be possible to set up a system 
> > > > > > > > > > > > > > > > > > > > > where the workers only ever speak to 
> > > > > > > > > > > > > > > > > > > > > an API server and never need to touch 
> > > > > > > > > > > > > > > > > > > > > the DB.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 3. This is not something I personally 
> > > > > > > > > > > > > > > > > > > > > have insight into, but I think it 
> > > > > > > > > > > > > > > > > > > > > sounds like a good idea.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Finally, addressing your question 
> > > > > > > > > > > > > > > > > > > > > about a Cloudera provider. If 
> > > > > > > > > > > > > > > > > > > > > anything, it would probably give the 
> > > > > > > > > > > > > > > > > > > > > provider _more_ legitimacy if you 
> > > > > > > > > > > > > > > > > > > > > hosted it under the Cloudera GitHub 
> > > > > > > > > > > > > > > > > > > > > org (we very purposely created the 
> > > > > > > > > > > > > > > > > > > > > provider packages with this workflow 
> > > > > > > > > > > > > > > > > > > > > in mind). There are multiple places 
> > > > > > > > > > > > > > > > > > > > > where we can work to surface this 
> > > > > > > > > > > > > > > > > > > > > provider so it is easy to find and 
> > > > > > > > > > > > > > > > > > > > > use.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Astronomer has a pretty good sample 
> > > > > > > > > > > > > > > > > > > > > provider here. One example of it 
> > > > > > > > > > > > > > > > > > > > > running in the wild is the Great 
> > > > > > > > > > > > > > > > > > > > > Expectations provider here. I'd also 
> > > > > > > > > > > > > > > > > > > > > be glad to get you in contact with 
> > > > > > > > > > > > > > > > > > > > > people who have built providers in 
> > > > > > > > > > > > > > > > > > > > > the past to help you with that 
> > > > > > > > > > > > > > > > > > > > > process.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Looking forward to seeing some of 
> > > > > > > > > > > > > > > > > > > > > these things come to fruition!
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Daniel
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Tue, Apr 13, 2021 at 9:43 AM, Ian 
> > > > > > > > > > > > > > > > > > > > > Buss <ianjb...@gmail.com> wrote:
> > > > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > First a quick introduction: I'm an 
> > > > > > > > > > > > > > > > > > > > > > engineer with Cloudera working on 
> > > > > > > > > > > > > > > > > > > > > > our Data Engineering product (CDE). 
> > > > > > > > > > > > > > > > > > > > > > Airflow is working great for us so 
> > > > > > > > > > > > > > > > > > > > > > far. We've been looking into how we 
> > > > > > > > > > > > > > > > > > > > > > can enhance the multi-tenancy story 
> > > > > > > > > > > > > > > > > > > > > > of Apache Airflow as we currently 
> > > > > > > > > > > > > > > > > > > > > > deploy it. We have the following 
> > > > > > > > > > > > > > > > > > > > > > areas which we'd like (with 
> > > > > > > > > > > > > > > > > > > > > > community consensus) to work on and 
> > > > > > > > > > > > > > > > > > > > > > contribute back to Apache Airflow 
> > > > > > > > > > > > > > > > > > > > > > to enhance the isolation between 
> > > > > > > > > > > > > > > > > > > > > > tenants in a single Airflow 
> > > > > > > > > > > > > > > > > > > > > > deployment.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 1. Isolating code execution and 
> > > > > > > > > > > > > > > > > > > > > > parsing of DAG files. At the 
> > > > > > > > > > > > > > > > > > > > > > moment, DAG files are parsed in a 
> > > > > > > > > > > > > > > > > > > > > > few locations in Airflow, including 
> > > > > > > > > > > > > > > > > > > > > > the scheduler and in tasks. There 
> > > > > > > > > > > > > > > > > > > > > > is already the concept of DAG 
> > > > > > > > > > > > > > > > > > > > > > serialization (and we're using that 
> > > > > > > > > > > > > > > > > > > > > > for the web component) but we'd be 
> > > > > > > > > > > > > > > > > > > > > > interested to see if we can sandbox 
> > > > > > > > > > > > > > > > > > > > > > the execution of arbitrary user 
> > > > > > > > > > > > > > > > > > > > > > code to a locked down 
> > > > > > > > > > > > > > > > > > > > > > process/container without full 
> > > > > > > > > > > > > > > > > > > > > > access to the metadata DB and 
> > > > > > > > > > > > > > > > > > > > > > connection secrets etc. The idea 
> > > > > > > > > > > > > > > > > > > > > > would be to parse and serialize the 
> > > > > > > > > > > > > > > > > > > > > > DAG in this isolated container and 
> > > > > > > > > > > > > > > > > > > > > > pass back a serialized 
> > > > > > > > > > > > > > > > > > > > > > representation for persistence in 
> > > > > > > > > > > > > > > > > > > > > > the DB. Has anyone explored this 
> > > > > > > > > > > > > > > > > > > > > > idea?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 2. Limiting task access to the 
> > > > > > > > > > > > > > > > > > > > > > metadata DB. It would be great if 
> > > > > > > > > > > > > > > > > > > > > > we could remove the requirement for 
> > > > > > > > > > > > > > > > > > > > > > tasks to have full access to the 
> > > > > > > > > > > > > > > > > > > > > > metadata DB and to report task 
> > > > > > > > > > > > > > > > > > > > > > status in a different (but still 
> > > > > > > > > > > > > > > > > > > > > > scalable) way. We'd need to tackle 
> > > > > > > > > > > > > > > > > > > > > > access or injection of connection, 
> > > > > > > > > > > > > > > > > > > > > > variable and xcom data as well for 
> > > > > > > > > > > > > > > > > > > > > > each task naturally.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 3. Finer-grained access controls on 
> > > > > > > > > > > > > > > > > > > > > > connection secrets. Right now, 
> > > > > > > > > > > > > > > > > > > > > > although there are nice at-rest 
> > > > > > > > > > > > > > > > > > > > > > encryption options with Fernet or 
> > > > > > > > > > > > > > > > > > > > > > Vault, IIUC any DAG can access any 
> > > > > > > > > > > > > > > > > > > > > > connection (and thus any secret). 
> > > > > > > > > > > > > > > > > > > > > > Since the "run as" user is largely 
> > > > > > > > > > > > > > > > > > > > > > defined within the DAG and its 
> > > > > > > > > > > > > > > > > > > > > > tasks, this is challenging for a 
> > > > > > > > > > > > > > > > > > > > > > multi-tenant environment (see 
> > > > > > > > > > > > > > > > > > > > > > caveat below)
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Caveat: It's definitely noted that 
> > > > > > > > > > > > > > > > > > > > > > to some extent we should assume 
> > > > > > > > > > > > > > > > > > > > > > that an Airflow deployment is a 
> > > > > > > > > > > > > > > > > > > > > > "trusted" environment and that best 
> > > > > > > > > > > > > > > > > > > > > > practices such as git+PR workflows 
> > > > > > > > > > > > > > > > > > > > > > are the gold standard and that any 
> > > > > > > > > > > > > > > > > > > > > > malicious code and dependencies 
> > > > > > > > > > > > > > > > > > > > > > should be identified through this 
> > > > > > > > > > > > > > > > > > > > > > process. Also that there is a clear 
> > > > > > > > > > > > > > > > > > > > > > admin role for connection 
> > > > > > > > > > > > > > > > > > > > > > management etc.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > We have some ideas informally 
> > > > > > > > > > > > > > > > > > > > > > sketched out as to how to address 
> > > > > > > > > > > > > > > > > > > > > > the above but would be keen to hear 
> > > > > > > > > > > > > > > > > > > > > > the community opinion on this and 
> > > > > > > > > > > > > > > > > > > > > > to see if anyone is keen to 
> > > > > > > > > > > > > > > > > > > > > > collaborate on designs and 
> > > > > > > > > > > > > > > > > > > > > > implementation, or to hear if 
> > > > > > > > > > > > > > > > > > > > > > anything is already in the works. 
> > > > > > > > > > > > > > > > > > > > > > In particular I noticed that the 
> > > > > > > > > > > > > > > > > > > > > > very first improvement proposal 
> > > > > > > > > > > > > > > > > > > > > > (AIP-1) addresses much of the above 
> > > > > > > > > > > > > > > > > > > > > > :). However, it seems fairly 
> > > > > > > > > > > > > > > > > > > > > > dormant at the moment.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > One other question: we have a 
> > > > > > > > > > > > > > > > > > > > > > provider (operators and hooks) for 
> > > > > > > > > > > > > > > > > > > > > > interacting with Cloudera 
> > > > > > > > > > > > > > > > > > > > > > components that we'd like to 
> > > > > > > > > > > > > > > > > > > > > > contribute to the project. The 
> > > > > > > > > > > > > > > > > > > > > > provider FAQs indicate that new 
> > > > > > > > > > > > > > > > > > > > > > provider contributions are still 
> > > > > > > > > > > > > > > > > > > > > > welcome in the project in 2.x, is 
> > > > > > > > > > > > > > > > > > > > > > that accurate?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks in advance!
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Ian

Reply via email to