Hey Dan, Thanks for the update! Please keep us posted.
Cheers, Chris On Mon, May 2, 2016 at 4:47 PM, Dan Davydov <[email protected]> wrote: > So a quick update, unfortunately we saw some DAGBag parsing time increases > (~10x for some DAGs) on the webservers with the 1.7.1rc3. Because of this I > will be working on a staging cluster that has a copy of our production > production DAGBag, and is a copy of our production airflow infrastructure, > just without the workers. This will let us debug the release outside of > production. > > On Thu, Apr 28, 2016 at 10:20 AM, Dan Davydov <[email protected]> > wrote: > > > Definitely, here were the issues we hit: > > - airbnb/airflow#1365 occured > > - Webservers/scheduler were timing out and stuck in restart cycles due to > > increased time spent on parsing DAGs due to airbnb/airflow#1213/files > > - Failed tasks that ran after the upgrade and the revert (after we > > reverted the upgrade) were unable to be cleared (but running the tasks > > through the UI worked without clearing them) > > - The way log files were stored on S3 was changed (airflow now requires a > > connection to be setup) which broke log storage > > - Some DAGs were broken (unable to be parsed) due to package > > reorganization in open-source (the import paths were changed) (the utils > > refactor commit) > > > > On Thu, Apr 28, 2016 at 12:17 AM, Bolke de Bruin <[email protected]> > > wrote: > > > >> Dan, > >> > >> Are you able to share some of the bugs you have been hitting and > >> connected commits? > >> > >> We could at the very least learn from them and maybe even improve > testing. > >> > >> Bolke > >> > >> > >> > Op 28 apr. 2016, om 06:51 heeft Dan Davydov > >> <[email protected]> het volgende geschreven: > >> > > >> > All of the blockers were fixed as of yesterday (there was some issue > >> that > >> > Jeremiah was looking at with the last release candidate which I think > is > >> > fixed but I'm not sure). I started staging the airbnb_1.7.1rc3 tag > >> earlier > >> > today, so as long as metrics look OK and the 1.7.1rc2 issues seem > >> resolved > >> > tomorrow I will release internally either tomorrow or Monday (we try > to > >> > avoid releases on Friday). If there aren't any issues we can push the > >> 1.7.1 > >> > tag on Monday/Tuesday. > >> > > >> > @Sid > >> > I think we were originally aiming to deploy internally once every two > >> weeks > >> > but we decided to do it once a month in the end. I'm not too sure > about > >> > that so Max can comment there. > >> > > >> > We have been running 1.7.0 in production for about a month now and it > >> > stable. > >> > > >> > I think what really slowed down this release cycle is some commits > that > >> > caused severe bugs that we decided to roll-forward with instead of > >> rolling > >> > back. We can potentially try reverting these commits next time while > the > >> > fixes are applied for the next version, although this is not always > >> trivial > >> > to do. > >> > > >> > On Wed, Apr 27, 2016 at 9:31 PM, Siddharth Anand < > >> > [email protected]> wrote: > >> > > >> >> Btw, is anyone of the committers running 1.7.0 or later in any > staging > >> or > >> >> production env? I have to say that given that 1.6.2 was the most > stable > >> >> release and is 4 or more months old does not say much for our release > >> >> cadence or process. What's our plan for 1.7.1? > >> >> > >> >> Sent from Sid's iPhone > >> >> > >> >>> On Apr 27, 2016, at 9:05 PM, Chris Riccomini <[email protected] > > > >> >> wrote: > >> >>> > >> >>> Hey all, > >> >>> > >> >>> I just wanted to check in on the 1.7.1 release status. I know there > >> have > >> >>> been some major-ish bugs, as well as several people doing tests. > >> Should > >> >> we > >> >>> create a 1.7.1 release JIRA, and track outstanding issues there? > >> >>> > >> >>> Cheers, > >> >>> Chris > >> >> > >> >> > >> > >> > > >
