nathan warshauer created AIRFLOW-1868:
-----------------------------------------

             Summary: Packaged Dags not added to dag table, unable to execute 
tasks
                 Key: AIRFLOW-1868
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1868
             Project: Apache Airflow
          Issue Type: Bug
         Environment: airflow 1.8.2, celery, rabbitMQ, mySQL, aws
            Reporter: nathan warshauer
         Attachments: Screen Shot 2017-11-29 at 2.31.02 PM.png, Screen Shot 
2017-11-29 at 4.40.39 PM.png, Screen Shot 2017-11-29 at 4.42.39 PM.png

.zip files in the dag directory do not appear to be getting added to the dag 
table on the airflow database.  When a .zip file is placed within the dags 
folder and it contains executable .py files, the dag_id should be added to the 
dag table and airflow should allow the dag to be unpaused and run through the 
web server.
SELECT distinct dag.dag_id AS dag_dag_id FROM dag confirms the dag does not 
exist in the dags table but shows up on the UI with the warning message "This 
Dag seems to be existing only locally" however the dag exists in all 3 dag 
directories (master and two workers) and the airflow.cfg has donot_pickle = True
When the dag is triggered manually via airflow trigger_dag <dag_id> the process 
goes to the web server and does not execute any tasks.  When I go to the task 
and click start through the UI the task will execute successfully and shows the 
attached state upon completion.  When I do not do this process the tasks will 
not enter the queue and the run sits idle as the 3rd attached image shows.
Basically, the dag CAN run manually from the zip BUT the scheduler and 
underlying database tables appear to not be functioning correctly for packaged 
dags.
Please let me know if I can provide any additional information regarding this 
issue, or if you all have any leads that I can check out for resolving this.

dag = DAG('MY-DAG-NAME', 
  default_args=default_args, 
  schedule_interval='*/5 * * * *',
  max_active_runs=1,
  dagrun_timeout=timedelta(minutes=4, seconds=30))

default_args = {
  'depends_on_past': False,
  'email': ['[email protected]'],
  'email_on_failure': True,
  'email_on_retry': False,
  'owner': 'airflow',
  'provide_context': True,
  'retries': 0,
  'retry_delay': timedelta(minutes=5),
  'start_date': datetime(2017,11,28)
}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to