Although the race condition doesn't explain why “num_runs = None” resolved the issue for you earlier, but it does give a clue now: the PR that introduced “num_runs = -1” was there to be able to work with empty dag dirs, maybe it wasn’t fully covered yet.
Bolke > On 12 Feb 2017, at 12:26, Bolke de Bruin <[email protected]> wrote: > > Ok great! Thanks! That sounds like a race condition: module not available yet > at time of reading. I would expect that it resolves itself after a while. > > After talking to some people at the Warsaw BigData conf I have some ideas > around syncing dags, Spoiler: no dependency on git. > > - Bolke > >> On 12 Feb 2017, at 11:17, Alex Van Boxel <[email protected]> wrote: >> >> Running ok, in staging... @bolke I'm running patch-less. I've switched my >> Kubernetes from: >> >> - each container (webserver/scheduler/worker) had a git-sync'er (getting >> the dags from git) >>> this meant that the scheduler had 0 dags at startup, and should have >> picked them up later >> >> to >> >> - single NFS share that shares airflow_home over each container >>> the git sync'er is now a seperate container running before the other >> containers >> >> This resolved my mystery DAG crashes. >> >> I'll be updating production to a patchless RC3 today, you get my vote after >> that. >> >> >> >> >> On Sun, Feb 12, 2017 at 4:59 AM Boris Tyukin <[email protected]> wrote: >> >>> awesome! thanks Jeremiah >>> >>> On Sat, Feb 11, 2017 at 12:53 PM, Jeremiah Lowin <[email protected]> >>> wrote: >>> >>>> Boris, I submitted a PR to address your second point -- >>>> https://github.com/apache/incubator-airflow/pull/2068. Thanks! >>>> >>>> On Sat, Feb 11, 2017 at 10:42 AM Boris Tyukin <[email protected]> >>>> wrote: >>>> >>>>> I am running LocalExecutor and not doing crazy things but use DAG >>>>> generation heavily - everything runs fine as before. As I mentioned in >>>>> other threads only had a few issues: >>>>> >>>>> 1) had to upgrade MySQL which was a PAIN. Cloudera CDH is running old >>>>> version of MySQL which was compatible with 1.7.1 but not compatible now >>>>> with 1.8 because of fractional seconds support PR. >>>>> >>>>> 2) when you install airflow, there are two new example DAGs >>>>> (last_task_only) which are going back very far in the past and >>> scheduled >>>> to >>>>> run every hour - a bunch of dags triggered on the first start of >>>> scheduler >>>>> and hosed my CPU >>>>> >>>>> Everything else was fine and I LOVE lots of small UI changes, which >>>> reduced >>>>> a lot my use of cli. >>>>> >>>>> Thanks again for the amazing work and an awesome project! >>>>> >>>>> >>>>> On Sat, Feb 11, 2017 at 9:17 AM, Jeremiah Lowin <[email protected]> >>>> wrote: >>>>> >>>>>> I was able to deploy successfully. +1 (binding) >>>>>> >>>>>> On Fri, Feb 10, 2017 at 7:37 PM Maxime Beauchemin < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> +1 (binding) >>>>>>> >>>>>>> On Fri, Feb 10, 2017 at 3:44 PM, Arthur Wiedmer < >>>>>> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> +1 (binding) >>>>>>>> >>>>>>>> On Feb 10, 2017 3:13 PM, "Dan Davydov" <[email protected]. >>>>>> invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Our staging looks good, all the DAGs there pass. >>>>>>>>> +1 (binding) >>>>>>>>> >>>>>>>>> On Fri, Feb 10, 2017 at 10:21 AM, Chris Riccomini < >>>>>>> [email protected] >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Running in all environments. Will vote after the weekend to >>>> make >>>>>> sure >>>>>>>>>> things are working properly, but so far so good. >>>>>>>>>> >>>>>>>>>> On Fri, Feb 10, 2017 at 6:05 AM, Bolke de Bruin < >>>>> [email protected] >>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Dear All, >>>>>>>>>>> >>>>>>>>>>> Let’s try again! >>>>>>>>>>> >>>>>>>>>>> I have made the THIRD RELEASE CANDIDATE of Airflow 1.8.0 >>>>>> available >>>>>>>> at: >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/ >>> < >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/> >>> , >>>>>>> public >>>>>>>>> keys >>>>>>>>>>> are available at https://dist.apache.org/repos/ >>>>>>>> dist/release/incubator/ >>>>>>>>>>> airflow/ < >>>>> https://dist.apache.org/repos/dist/release/incubator/ >>>>>>>>> airflow/> >>>>>>>>>>> . It is tagged with a local version “apache.incubating” so >>> it >>>>>>> allows >>>>>>>>>>> upgrading from earlier releases. >>>>>>>>>>> >>>>>>>>>>> Two issues have been fixed since release candidate 2: >>>>>>>>>>> >>>>>>>>>>> * trigger_dag could create dags with fractional seconds, >>> not >>>>>>>> supported >>>>>>>>> by >>>>>>>>>>> logging and UI at the moment >>>>>>>>>>> * local api client trigger_dag had hardcoded execution of >>>> None >>>>>>>>>>> >>>>>>>>>>> Known issue: >>>>>>>>>>> * Airflow on kubernetes and num_runs -1 (default) can >>> expose >>>>>> import >>>>>>>>>> issues. >>>>>>>>>>> >>>>>>>>>>> I have extensively discussed this with Alex (reporter) and >>> we >>>>>>>> consider >>>>>>>>>>> this a known issue with a workaround available as we are >>>> unable >>>>>> to >>>>>>>>>>> replicate this in a different environment. UPDATING.md has >>>> been >>>>>>>> updated >>>>>>>>>>> with the work around. >>>>>>>>>>> >>>>>>>>>>> As these issues are confined to a very specific area and >>> full >>>>>> unit >>>>>>>>> tests >>>>>>>>>>> were added I would also like to raise a VOTE for releasing >>>>> 1.8.0 >>>>>>>> based >>>>>>>>> on >>>>>>>>>>> release candidate 3, i.e. just renaming release candidate 3 >>>> to >>>>>>> 1.8.0 >>>>>>>>>>> release. >>>>>>>>>>> >>>>>>>>>>> Please respond to this email by: >>>>>>>>>>> >>>>>>>>>>> +1,0,-1 with *binding* if you are a PMC member or >>>> *non-binding* >>>>>> if >>>>>>>> you >>>>>>>>>> are >>>>>>>>>>> not. >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> Bolke >>>>>>>>>>> >>>>>>>>>>> My VOTE: +1 (binding) >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> -- >> _/ >> _/ Alex Van Boxel >
