Alex, Do you have anything more to go on? I don’t mind reverting the patch, however it code part seems unrelated to what you described and the issue wasn’t reproducible. I would really like to see more logging and maybe a test in a clean environment plus debugging. Preferable I would like to make RC 2 available today and immediately raise a vote as the *current* changes are really small, are confined to contrib and have been tested by the people using it.
But I am holding off for now due to your concern. Cheers Bolke > On 7 Feb 2017, at 20:56, Bolke de Bruin <[email protected]> wrote: > > How do you start the scheduler Alex? What are the command line parameters? > What are the logs when it doesn’t work? > > Bolke > > > >> On 7 Feb 2017, at 18:52, Alex Van Boxel <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hey Feng, >> >> The upgrades are all automated (including the workers/web/scheduler). And I >> tripple checked, I now am test running RC1 just with the your line reverted >> (and look ok) >> >> Could you do me a favour and add a test dag where you do a local import. >> Example: >> >> bqschema.py >> def ranking(): >> return [ >> {"name": "bucket_date", "type": "timestamp", "mode": "nullable"}, >> {"name": "rank", "type": "integer", "mode": "nullable"}, >> {"name": "audience_preference", "type": "float", "mode": "nullable"}, >> {"name": "audience_likelihood_share", "type": "float", "mode": >> "nullable"} >> ] >> >> dag.py >> import bqschema >> ... >> all in the same dag folder. We use it to define out BigQuery schema's into a >> seperate file. >> >> >> On Tue, Feb 7, 2017 at 6:37 PM Feng Lu <[email protected] >> <mailto:[email protected]>> wrote: >> Hi Alex- >> >> Please see the attached screenshots of my local testing using celeryexecutor >> (on k8s as well). >> All look good and the workflow is successfully completed. >> >> Curious did you also update the worker image? >> Sorry for the confusion, happy to debug more if you could share with me your >> k8s setup. >> >> Feng >> >> On Tue, Feb 7, 2017 at 8:37 AM, Feng Lu <[email protected] >> <mailto:[email protected]>> wrote: >> When num_runs is not explicitly specified, the default is set to -1 to match >> the expectation of SchedulerJob here: >> <Screen Shot 2017-02-07 at 8.01.26 AM.png> >> >> Doing so also matches the type of num_runs ('int' in this case). >> The scheduler will run non-stop as a result regardless whether dag files are >> present (since the num_runs default is now -1: unlimited). >> >> Based on what Alex described, the import error doesn't look like directly >> related to this change. >> Maybe this one? >> https://github.com/apache/incubator-airflow/commit/67cbb966410226c1489bb730af3af45330fc51b9 >> >> <https://github.com/apache/incubator-airflow/commit/67cbb966410226c1489bb730af3af45330fc51b9> >> >> I am still in the middle of running some quick test using celery executor, >> will update the thread once it's done. >> >> >> On Tue, Feb 7, 2017 at 6:56 AM, Bolke de Bruin <[email protected] >> <mailto:[email protected]>> wrote: >> Hey Alex, >> >> Thanks for tracking it down. Can you elaborate want went wrong with celery? >> The lines below do not particularly relate to Celery directly, so I wonder >> why we are not seeing it with LocalExecutor? >> >> Cheers >> Bolke >> >> > On 7 Feb 2017, at 15:51, Alex Van Boxel <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > I have to give the RC1 a *-1*. I spend hours, or better days to get the RC >> > running with Celery on our test environment, till I finally found the >> > commit that killed it: >> > >> > e7f6212cae82c3a3a0bc17bbcbc70646f67d02eb >> > [AIRFLOW-813] Fix unterminated unit tests in SchedulerJobTest >> > Closes #2032 from fenglu-g/master >> > >> > I was always looking at the wrong this, because the commit only changes a >> > single default parameter from *None to -1* >> > >> > I do have the impression I'm the only one running with Celery. Are other >> > people running with it? >> > >> > *I propose* *reverting the commit*. Feng, can you elaborate on this change? >> > >> > Change the default back no *None* in cli.py got it finally working: >> > >> > 'num_runs': Arg( >> > ("-n", "--num_runs"), >> > default=None, type=int, >> > help="Set the number of runs to execute before exiting"), >> > >> > Thanks. >> > >> > On Tue, Feb 7, 2017 at 3:49 AM siddharth anand <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > I did get 1.8.0 installed and running at Agari. >> > >> > I did run into 2 problems. >> > 1. Most of our DAGs broke due the way Operators are now imported. >> > https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#deprecated-features >> > >> > <https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#deprecated-features> >> > >> > According to the documentation, these deprecations would only cause an >> > issue in 2.0. However, I needed to fix them now. >> > >> > So, I needed to change "from airflow.operators import PythonOperator" to >> > from "from airflow.operators.python_operator import PythonOperator". Am I >> > missing something? >> > >> > 2. I ran into a migration problem that seems to have cleared itself up. I >> > did notice that some dags do not have data in their "DAG Runs" column on >> > the overview page computed. I am looking into that issue presently. >> > https://www.dropbox.com/s/cn058mtu3vcv8sq/Screenshot%202017-02-06%2018.45.07.png?dl=0 >> > >> > <https://www.dropbox.com/s/cn058mtu3vcv8sq/Screenshot%202017-02-06%2018.45.07.png?dl=0> >> > >> > -s >> > >> > On Mon, Feb 6, 2017 at 4:30 PM, Dan Davydov <[email protected] >> > <mailto:[email protected]>.invalid> >> > wrote: >> > >> >> Bolke, attached is the patch for the cgroups fix. Let me know which >> >> branches you would like me to merge it to. If anyone has complaints about >> >> the patch let me know (but it does not touch the core of airflow, only the >> >> new cgroups task runner). >> >> >> >> On Mon, Feb 6, 2017 at 4:24 PM, siddharth anand <[email protected] >> >> <mailto:[email protected]>> wrote: >> >> >> >>> Actually, I see the error is further down.. >> >>> >> >>> File >> >>> "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", >> >>> line >> >>> 469, in do_execute >> >>> >> >>> cursor.execute(statement, parameters) >> >>> >> >>> sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in >> >>> column "dag_id" violates not-null constraint >> >>> >> >>> DETAIL: Failing row contains (null, running, 1, f). >> >>> >> >>> [SQL: 'INSERT INTO dag_stats (state, count, dirty) VALUES (%(state)s, >> >>> %(count)s, %(dirty)s)'] [parameters: {'count': 1L, 'state': u'running', >> >>> 'dirty': False}] >> >>> >> >>> It looks like an autoincrement is missing for this table. >> >>> >> >>> >> >>> I'm running `SQLAlchemy==1.1.4` - I see our setup.py specifies any >> > version >> >>> greater than 0.9.8 >> >>> >> >>> -s >> >>> >> >>> >> >>> >> >>> On Mon, Feb 6, 2017 at 4:11 PM, siddharth anand <[email protected] >> >>> <mailto:[email protected]>> >> >>> wrote: >> >>> >> >>>> I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install >> >>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/ >> >>>> <https://dist.apache.org/repos/dist/dev/incubator/airflow/> >> >>>> airflow-1.8.0rc1+apache.incubating.tar.gz and then running airflow >> >>>> upgradedb didn't quite work. First, I thought it completed >> > successfully, >> >>>> then saw errors some tables were indeed missing. I ran it again and >> >>>> encountered the following exception : >> >>>> >> >>>> DB: postgresql://[email protected]:543 >> >>>> <http://[email protected]:543/> >> >>> 2/airflow >> >>>> >> >>>> [2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables >> >>>> >> >>>> INFO [alembic.runtime.migration] Context impl PostgresqlImpl. >> >>>> >> >>>> INFO [alembic.runtime.migration] Will assume transactional DDL. >> >>>> >> >>>> INFO [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> >> >>>> 211e584da130, add TI state index >> >>>> >> >>>> INFO [alembic.runtime.migration] Running upgrade 211e584da130 -> >> >>>> 64de9cddf6c9, add task fails journal table >> >>>> >> >>>> INFO [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> >> >>>> f2ca10b85618, add dag_stats table >> >>>> >> >>>> INFO [alembic.runtime.migration] Running upgrade f2ca10b85618 -> >> >>>> 4addfa1236f1, Add fractional seconds to mysql tables >> >>>> >> >>>> INFO [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> >> >>>> 8504051e801b, xcom dag task indices >> >>>> >> >>>> INFO [alembic.runtime.migration] Running upgrade 8504051e801b -> >> >>>> 5e7d17757c7a, add pid field to TaskInstance >> >>>> >> >>>> INFO [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> >> >>>> 127d2bf2dfa7, Add dag_id/state index on dag_run table >> >>>> >> >>>> /usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/crud.py:692: >> >>>> SAWarning: Column 'dag_stats.dag_id' is marked as a member of the >> >>> primary >> >>>> key for table 'dag_stats', but has no Python-side or server-side >> > default >> >>>> generator indicated, nor does it indicate 'autoincrement=True' or >> >>>> 'nullable=True', and no explicit value is passed. Primary key columns >> >>>> typically may not store NULL. Note that as of SQLAlchemy 1.1, >> >>>> 'autoincrement=True' must be indicated explicitly for composite (e.g. >> >>>> multicolumn) primary keys if AUTO_INCREMENT/SERIAL/IDENTITY behavior is >> >>>> expected for one of the columns in the primary key. CREATE TABLE >> >>> statements >> >>>> are impacted by this change as well on most backends. >> >>>> >> >>> >> >> >> >> >> > >> > -- >> > _/ >> > _/ Alex Van Boxel >> >> >> -- >> _/ >> _/ Alex Van Boxel >
