eladkal commented on a change in pull request #15183:
URL: https://github.com/apache/airflow/pull/15183#discussion_r606774296



##########
File path: docs/apache-airflow/common-pitfall.rst
##########
@@ -0,0 +1,202 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+
+Common Pitfalls
+===============
+
+Airflow Configurations
+^^^^^^^^^^^^^^^^^^^^^^
+
+Configuring parallelism
+-----------------------
+
+These configurations are executor agnostic.
+
+- :ref:`config:core__parallelism`
+
+  The maximum number of task instances that Airflow can run concurrently.
+  This number usually reflects the number of task instances with the
+  running state in the metadata database.
+
+- :ref:`config:core__dag_concurrency`
+
+  The maximum number of task instances to be allowed to run concurrently
+  per DAG. To calculate the number of tasks that is running concurrently
+  for a DAG, add up the number of running tasks for all DAG runs of the DAG.
+  This is configurable at the DAG level with ``concurrency``.
+
+- :ref:`config:core__max_active_runs_per_dag`
+
+  The maximum number of active DAG runs per DAG. The scheduler will not
+  create more DAG runs if it reaches the limit. This is configurable at
+  the DAG level with ``max_active_runs``.
+
+DAG Structure and DAG Parameters
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Multiple DAG definitions per Python file
+----------------------------------------
+
+Airflow does support more than one DAG definition per python file, but it is 
not recommended as Airflow would like
+better isolation between DAGs from a fault and deployment perspective and 
multiple DAGs in the same file goes against
+that. For now, make sure that the DAG object is in the global namespace for it 
be recognized by Airflow.
+
+.. code-block:: python
+
+        globals ()[dag_id] = DAG(...)
+
+Refer to :ref:`how to build dynamic DAGs<faq:dynamic_dag>`.
+
+Top level Python code
+----------------------
+
+While it is not recommended to write any code outside of defining Airflow 
constructs, Airflow does support any
+arbitrary python code as long as it does not break the DAG file processor or 
prolong file processing time past the
+:ref:`config:core__dagbag_import_timeout` value.
+
+A common example is the violation of the time limit when building a dynamic 
DAG which usually requires querying data
+from another service like a database. At the same time, the requested service 
is being swamped with DAG file
+processors requests for data to process the file. These unintended 
interactions may cause the service to deteriorate
+and eventually cause DAG file processing to fail.
+
+Refer to :ref:`DAG writing best practices<best_practice:writing_a_dag>` for 
more information.
+
+Double Jinja Templating
+-----------------------
+
+It is not possible to render a Jinja template within another Jinja template. 
This is commonly attempted in
+``user_defined_macros``.
+
+.. code-block:: python
+
+        dag = DAG(
+            ...
+            user_defined_macros={
+                'my_custom_macro': 'day={{ ds }}'
+            }
+        )
+
+        bo = BashOperator(
+            task_id='my_task',
+            bash_command="echo {{ my_custom_macro }}",
+            dag=dag
+        )
+
+This will echo "day={{ ds }}" instead of "day=2020-01-01" for a dagrun with 
the execution date 2020-01-01 00:00:00.
+
+.. code-block:: python
+
+        bo = BashOperator(
+            task_id='my_task',
+            bash_command="echo day={{ ds }}",
+            dag=dag
+        )
+
+By using the ds macros directly in the template_field, the rendered value 
results in "day=2020-01-01".
+
+Operators and Hooks
+^^^^^^^^^^^^^^^^^^^
+
+File templating and file extensions
+-----------------------------------
+
+TemplateNotFound errors are usually due to misalignment with user expectations 
when passing special values to Operator
+that trigger Jinja templating. A common occurrence is with BashOperators.
+
+Given BashOperator's ``template_fields`` includes ``bash_command`` and 
``template_ext`` is a non-empty list, Airflow
+will attempt to render ``bash_command`` with the contents of a file using the 
parameter value as the file path if
+the parameter value ends in one of the listed file extensions.
+
+.. code-block:: python
+
+        bo = BashOperator(
+            task_id='my_script',
+            bash_command="/usr/local/airflow/include/test.sh",
+            dag=dag
+        )
+
+If you wish to directly executed a bash script, you need to add a space after 
the script name to prevent Airflow from
+rendering the template using Jinja.

Review comment:
       This is duplicate content. It's explained in 
https://github.com/apache/airflow/blob/master/docs/apache-airflow/howto/operator/bash.rst#jinja-template-not-found
   
   You can create a ref link to `bash.rst` instead.

##########
File path: docs/apache-airflow/best-practices.rst
##########
@@ -54,6 +59,12 @@ Some of the ways you can avoid producing a different result -
     You should define repetitive parameters such as ``connection_id`` or S3 
paths in ``default_args`` rather than declaring them for each task.
     The ``default_args`` help to avoid mistakes such as typographical errors.
 
+Creating a custom Operator
+---------------------------
+
+When implementing custom operators, do not make any expensive expensive 
operations in their ``__init__``. They are
+going to be instantiated once per scheduler run per task using them, and 
making database calls can significantly slow
+down scheduling and waste resources.

Review comment:
       This is probably something we need to mention in 
https://github.com/apache/airflow/blob/master/docs/apache-airflow/howto/custom-operator.rst#creating-a-custom-operator
 and create a ref link there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to