Will do. And thanks. Adding another issue:
* Some of our DAGs are not getting scheduled for some unknown reason. Need to investigate why. Related but not root cause: * Logging is so chatty that it gets really hard to find the real issue Bolke. > On 20 Jan 2017, at 23:45, Dan Davydov <dan.davy...@airbnb.com.INVALID> wrote: > > I'd be happy to lend a hand fixing these issues and hopefully some others > are too. Do you mind creating jiras for these since you have the full > context? I have created a JIRA for (1) and have assigned it to myself: > https://issues.apache.org/jira/browse/AIRFLOW-780 > > On Fri, Jan 20, 2017 at 1:01 AM, Bolke de Bruin <bdbr...@gmail.com> wrote: > >> This is to report back on some of the (early) experiences we have with >> Airflow 1.8.0 (beta 1 at the moment): >> >> 1. The UI does not show faulty DAG, leading to confusion for developers. >> When a faulty dag is placed in the dags folder the UI would report a >> parsing error. Now it doesn’t due to the separate parising (but not >> reporting back errors) >> >> 2. The hive hook sets ‘airflow.ctx.dag_id’ in hive >> We run in a secure environment which requires this variable to be >> whitelisted if it is modified (needs to be added to UPDATING.md) >> >> 3. DagRuns do not exist for certain tasks, but don’t get fixed >> Log gets flooded without a suggestion what to do >> >> 4. At start up all running dag_runs are being checked, we seemed to have a >> lot of “left over” dag_runs (couple of thousand) >> - Checking was logged to INFO -> requires a fsync for every log message >> making it very slow >> - Checking would happen at every restart, but dag_runs’ states were not >> being updated >> - These dag_runs would never er be marked anything else than running for >> some reason >> -> Applied work around to update all dag_run in sql before a certain date >> to -> finished >> -> need to investigate why dag_runs did not get marked “finished/failed” >> >> 5. Our umask is set to 027 >> >>