Jonny Evans created AIRFLOW-7027:
------------------------------------
Summary: The mirrored data folder for BigQuery_operators can't be
accessed on manual runs
Key: AIRFLOW-7027
URL: https://issues.apache.org/jira/browse/AIRFLOW-7027
Project: Apache Airflow
Issue Type: Bug
Components: contrib, DAG
Affects Versions: 1.10.9
Environment: Windows 10 Pro, i7-4790S Processor, 16MB RAM
Reporter: Jonny Evans
Using Airflow through the Google Cloud Composer, I've placed a series of text
files in the /data folder of the bucket as suggested in the documentation for
where to store external data files and have written a BigQueryOperator of the
following format:
{{
with
open('/home/airflow/gcs/data/{0}.txt'.format(models.Variable.get('tmpcreatives')),'r')
as tmp_file: tmp_transfer = tmp_file.read() bq_sql_tmptransfer =
bigquery_operator.BigQueryOperator( task_id = 'task1', sql = """ {0}
""".format(tmp_transfer.format(tradata =
dag.params["ClientDatabase"]+dag.params["bq_param1"],rawdata =
dag.params["ClientDatabase"]+dag.params["bq_param2"])), use_legacy_sql = False
)
}}
On scheduled runs, the DAG run's fine and completes the task, however if I try
to manually trigger the DAG or look at the run logs it comes up with the
message 'DAG "DataCreation_DAG_" seems to be missing' This is only an issue
when I use the open() function, if I replace that section with a hardcoded
string then the DAG works fine even on manual runs, I think it's a bug with
mounting the /data file from the cloud shell bucket but not entirely sure
--
This message was sent by Atlassian Jira
(v8.3.4#803005)