Thanks Chris, will do! On Wed, Jul 5, 2017 at 1:26 PM Chris Riccomini <criccom...@apache.org> wrote:
> @Daniel, done! Should have access. Please create the wiki as a subpage > under: > > https://cwiki.apache.org/confluence/display/AIRFLOW/Roadmap > > On Wed, Jul 5, 2017 at 1:20 PM, Daniel Imberman <daniel.imber...@gmail.com > > > wrote: > > > @chris: Thank you! My wiki name is dimberman. > > @gerard: I've started writing out my reply but there's a fair amount to > > respond to so I'll need a few minutes :). > > > > On Wed, Jul 5, 2017 at 1:17 PM Chris Riccomini <criccom...@apache.org> > > wrote: > > > > > @daniel, what's your wiki username? I can grant you access. > > > > > > On Wed, Jul 5, 2017 at 12:35 PM, Gerard Toonstra <gtoons...@gmail.com> > > > wrote: > > > > > > > Hey Daniel, > > > > > > > > Great work. We're looking at running airflow on AWS ECS inside docker > > > > containers and making great progress on this. > > > > We use redis and RDS as managed services to form a comms backbone and > > > then > > > > just spawn webserver, scheduler, worker and flower containers > > > > as needed on ECS. We deploy dags using an Elastic File System (shared > > > > across all instances), which then map this read-only into the docker > > > > container. > > > > We're now evaluating this setup going forward in more earnest. > > > > > > > > Good idea to use queues to separate dependencies or some other > concerns > > > > (high-mem pods?), there are many ways this way that it's possible to > > > > customize where and on which hw a DAG is going to run. We're looking > at > > > > Cycle scaling to temporarily increase resources in a morning run and > > > create > > > > larger worker containers for data science tasks and perhaps some > other > > > > tasks. > > > > > > > > > > > > - In terms of tooling: The current airflow config is somewhat static > > in > > > > the sense that it does not reconfigure itself to the (now) dynamic > > > > environment. > > > > You'd think that airflow should have to query the environment to > > figure > > > > out parallellism instead of statically specifying this. > > > > > > > > - Sometimes DAGs import hooks or operators that import dependencies > at > > > the > > > > top. The only reason, (I think) that a scheduler needs to physically > > > > import and parse a DAG is because there may be dynamically built > > > elements > > > > within the DAG. If there wouldn't be static elements, it is > > theoretically > > > > possible to optimize this. Your PDF sort of hints towards a > system > > > > where a worker where a DAG will eventually run could parse the DAG > and > > > > report > > > > back a meta description of the DAG, which could simplify and > > optimize > > > > performance of the scheduler at the cost of network roundtrips. > > > > > > > > - About redeploying instances: We see this as a potential issue for > > our > > > > setup. My take is that jobs simply shouldn't take that much time in > > > > principle to start with, > > > > which avoids having to worry about this. If that's ridiculous, > > > shouldn't > > > > it be a concern of the environment airflow runs in rather than > airflow > > > > itself? I.e.... > > > > further tool out kubernetes CLI's / operators to query the > > environment > > > > to plan/deny/schedule this kind of work automatically. Beacuse k8s > was > > > > probably > > > > built from the perspective of handling short-running queries, > > running > > > > anything long-term on that is going to naturally compete with the > > > > architecture. > > > > > > > > - About failures and instances disappearing on failure: it's not > > > desirable > > > > to keep those instances around for a long time, we really do need to > > > depend > > > > on > > > > client logging and other services available to tell us what > > happened. > > > > The difference in thinking is that a pod/container is just a > temporary > > > > thing that runs a job > > > > and we should be interested in how the job did vs. how the > > > container/pod > > > > ran this. From my little experience with k8s though, I do see that it > > > tends > > > > to > > > > get rid of everything a little bit too quick on failure. One thing > > you > > > > could look into is to log onto a commonly shared volume with a > specific > > > > 'key' for that container, > > > > so you can always refer back to the important log file and fish > this > > > > out, with measures to clean up the shared filesystem on a regular > > basis. > > > > > > > > - About rescaling and starting jobs: it doesn't come for free as you > > > > mention. I think it's a great idea to be able to scale out on busy > > > > intervals (we intend to just use cycle scaling here), > > > > but a hint towards what policy or scaling strategy you intend to > use > > on > > > > k8s is welcome there. > > > > > > > > > > > > Gerard > > > > > > > > > > > > On Wed, Jul 5, 2017 at 8:43 PM, Daniel Imberman < > > > daniel.imber...@gmail.com > > > > > > > > > wrote: > > > > > > > > > @amit > > > > > > > > > > I've added the proposal to the PR for now. Should make it easier > for > > > > people > > > > > to get to it. Will delete once I add it to the wiki. > > > > > > > > > https://github.com/bloomberg/airflow/blob/29694ae9903c4dad3f18fb8eb767c4 > > > > > 922dbef2e8/dimberman-KubernetesExecutorProposal-050717-1423-36.pdf > > > > > > > > > > Daniel > > > > > > > > > > On Wed, Jul 5, 2017 at 11:36 AM Daniel Imberman < > > > > daniel.imber...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > Hi Amit, > > > > > > > > > > > > For now the design doc is included as an attachment to the > original > > > > > email. > > > > > > Once I am able to get permission to edit the wiki I would like > add > > it > > > > > there > > > > > > but for now I figured that this would get the ball rolling. > > > > > > > > > > > > > > > > > > Daniel > > > > > > > > > > > > > > > > > > On Wed, Jul 5, 2017 at 11:33 AM Amit Kulkarni <am...@wepay.com> > > > wrote: > > > > > > > > > > > >> Hi Daniel, > > > > > >> > > > > > >> I don't see link to design PDF. > > > > > >> > > > > > >> > > > > > >> Amit Kulkarni > > > > > >> Site Reliability Engineer > > > > > >> Mobile: (716)-352-3270 <(716)%20352-3270> <(716)%20352-3270> > <(716)%20352-3270> > > > > > >> > > > > > >> Payments partner to the platform economy > > > > > >> > > > > > >> On Wed, Jul 5, 2017 at 11:25 AM, Daniel Imberman < > > > > > >> daniel.imber...@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > >> > Hello Airflow community! > > > > > >> > > > > > > >> > My name is Daniel Imberman, and I have been working on behalf > of > > > > > >> Bloomberg > > > > > >> > LP to create an airflow kubernetes executor/operator. We > wanted > > to > > > > > allow > > > > > >> > for maximum throughput/scalability, while keeping a lot of the > > > > > >> kubernetes > > > > > >> > details abstracted away from the users. Below I have a link to > > the > > > > WIP > > > > > >> PR > > > > > >> > and the PDF of the initial proposal. If anyone has any > > > > > >> comments/questions I > > > > > >> > would be glad to discuss this feature further. > > > > > >> > > > > > > >> > Thank you, > > > > > >> > > > > > > >> > Daniel > > > > > >> > > > > > > >> > https://github.com/apache/incubator-airflow/pull/2414 > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >