[ 
https://issues.apache.org/jira/browse/AIRFLOW-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966122#comment-15966122
 ] 

Maxime Beauchemin commented on AIRFLOW-97:
------------------------------------------

The rational for it was that people may and will dump python files that do 
significant work outside of some `if __name__ == '__main__':` block and that 
Airflow, as it crawls and imports these modules, will trigger them. We've also 
seen people dumping entire libs in our pipelines folder, and the DagBag parsing 
process will import the living hell out of it. 

This is a naive attempt at jumping over files that don't look like an Airflow 
pipeline by introspecting the code without evaluating it. 

It may have been introduced after the module parsing timeout rule was 
introduced. Note that the DagBag timeout logic may be preferable in some ways, 
but that it has limitations. First it won't work under LocalExecutor for some 
reason I won't get into here. Second it sucks to pay the timeout price at every 
scheduler cycle. Perhaps a better approach would be for the process to add 
timeout scripts to a blacklist and surface it in the UI. Then users would have 
to re-enable bad actors manually.

> "airflow" "DAG" strings in file necessary to import dag
> -------------------------------------------------------
>
>                 Key: AIRFLOW-97
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-97
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: Airflow 1.7.0
>            Reporter: Etiene Dalcol
>            Priority: Minor
>
> Hello airflow team! Thanks for the awesome tool!
> We made a small module to automate our DAG building process and we are using 
> this module on our DAG definition. Our airflow version is 1.7.0.
> However, airflow will not import this file because it doesn't have the words 
> DAG and airflow on it. (The imports etc are done inside our little module). 
> Apparently there's a safe_mode that skips files without these strings.
> (https://github.com/apache/incubator-airflow/blob/1.7.0/airflow/models.py#L197)
> This safe_mode is default to True but is not passed to the process_file 
> function, so it is always True and there's no apparent way to disable it.
> (https://github.com/apache/incubator-airflow/blob/1.7.0/airflow/models.py#L177)
> (https://github.com/apache/incubator-airflow/blob/1.7.0/airflow/models.py#L313)
> Putting this comment on the top of the file makes it work for the moment and 
> brought me a good laugh today 👯 
> #DAG airflow —> DO NOT REMOVE. the world will explode



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to