Hi Fokko, I doubt we'll do a Behind The Scenes for data so soon after this one. We do engineering sessions some time, if you're interested in visiting our radically cool office, hit me up personally.
The questions in the audience were mostly about stability. Apparently people are having problems getting Airflow to run consistently and in a stable way. One specifically mentioned the message queue as a potential issue, which I responded as potentially having a relationship between the number of workers and configured parallellism (which means a larger amount of messages end up in the queue). The other related to Kubernetes. I pointed out the work being done by Daniel together with 2 google engineers. Later on I realized this could potentially be related to violating the resource constraints for a container, primarily because some processes could pick up a lot of data. Rgds, Gerard On Sun, Oct 29, 2017 at 1:20 PM, Driesprong, Fokko <[email protected]> wrote: > Hi Gerard, > > Thanks for sharing the presentation. Unfortunately I could not make it to > the make it to the presentation. Will there be a follow-up? For example > thinks that your team encountered when migrating from Azkaban to Airflow? > > Kind regards, > Fokko Driesprong > > 2017-10-29 11:29 GMT+01:00 Gerard Toonstra <[email protected]>: > > > Hi all, > > > > Thursday the 26/10 my employer Coolblue organized a "Behind the Scenes" > > event. It is an opportunity for engineers to talk about stuff they work > on > > and usually they provide two presentations. > > > > This event was about BigData and Processing. As (now) team lead of Data > > Platform, I decided to talk about Apache Airflow, which we are now in the > > process of migrating to (from Azkaban). > > > > Here are the slides: > > > > https://www.linkedin.com/feed/update/urn:li:activity:6330346647347875840 > > > > It is a technical presentation, aimed at informing people who are new to > > Airflow what the underlying architecture is and also presenting the why > > you'd want to use it in the first place. I based the architectural > diagrams > > on AWS on the PoC we did some time. > > > > Important takeaway: > > > > Airflow is built around some great design principles, which are the > result > > of important insights into data processing. These principles result in a > > tool, when used correctly according to these principles, to reduce the > ETL > > effort and maintenance and make time to work on higher level intelligent > > work like Machine Learning, Deep Learning and analysis of your data. > > > > It is very similar to the talk I gave at BigData Week London 2017: > > > > https://youtu.be/Ch2AQhOhefw > > > > Rgds, > > > > Gerard > > >
