This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new acc81213e6 Add coding rule for operators to avoid logic in 
constructors (#37035)
acc81213e6 is described below

commit acc81213e6a9ec94e38ab32fcedbaea8d7b0a343
Author: Jarek Potiuk <[email protected]>
AuthorDate: Fri Jan 26 21:33:05 2024 +0100

    Add coding rule for operators to avoid logic in constructors (#37035)
    
    Related: #36484 #33786
---
 contributing-docs/05_pull_requests.rst | 48 +++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/contributing-docs/05_pull_requests.rst 
b/contributing-docs/05_pull_requests.rst
index c2d20670a1..87b158f93c 100644
--- a/contributing-docs/05_pull_requests.rst
+++ b/contributing-docs/05_pull_requests.rst
@@ -25,7 +25,7 @@ implementing them.
 .. contents:: :local:
 
 Pull Request guidelines
-=======================
+-----------------------
 
 Before you submit a Pull Request (PR) from your forked repo, check that it 
meets
 these guidelines:
@@ -101,7 +101,7 @@ these guidelines:
     This makes the lives of those who come after you (and your future self) a 
lot easier.
 
 Experimental Requirement to resolve all conversations
-=====================================================
+-----------------------------------------------------
 
 In December 2023 we enabled - experimentally - the requirement to resolve all 
the open conversations in a
 PR in order to make it merge-able. You will see in the status of the PR that 
it needs to have all the
@@ -121,14 +121,14 @@ already solved (assuming that maintainers will start 
treating the conversations
 .. _coding_style:
 
 Coding style and best practices
-===============================
+-------------------------------
 
 Most of our coding style rules are enforced programmatically by ruff and mypy, 
which are run automatically
 with static checks and on every Pull Request (PR), but there are some rules 
that are not yet automated and
 are more Airflow specific or semantic than style.
 
 Don't Use Asserts Outside Tests
--------------------------------
+...............................
 
 Our community agreed that to various reasons we do not use ``assert`` in 
production code of Apache Airflow.
 For details check the relevant `mailing list thread 
<https://lists.apache.org/thread.html/bcf2d23fcd79e21b3aac9f32914e1bf656e05ffbcb8aa282af497a2d%40%3Cdev.airflow.apache.org%3E>`_.
@@ -155,7 +155,7 @@ The one exception to this is if you need to make an assert 
for type checking (wh
 
 
 Database Session Handling
--------------------------
+.........................
 
 **Explicit is better than implicit.** If a function accepts a ``session`` 
parameter it should not commit the
 transaction itself. Session management is up to the caller.
@@ -202,7 +202,7 @@ compatibility considerations. In most cases, ``session`` 
argument should be last
 
 
 Don't use time() for duration calculations
------------------------------------------
+..........................................
 
 If you wish to compute the time difference between two events with in the same 
process, use
 ``time.monotonic()``, not ``time.time()`` nor ``timezone.utcnow()``.
@@ -242,6 +242,42 @@ If the start_date of a duration calculation needs to be 
stored in a database, th
 datetime objects. In all other cases, using datetime for duration calculation 
MUST be avoided as creating and
 diffing datetime operations are (comparatively) slow.
 
+Templated fields in Operator's __init__ method
+..............................................
+
+Airflow Operators might have some fields added to the list of 
``template_fields``. Such fields should be
+set in the constructor (``__init__`` method) of the operator and usually their 
values should
+come from the ``__init__`` method arguments. The reason for that is that the 
templated fields
+are evaluated at the time of the operator execution and when you pass 
arguments to the operator
+in the DAG, the fields that are set on the class just before the ``execute`` 
method is called
+are processed through templating engine and the fields values are set to the 
result of applying the
+templating engine to the fields (in case the field is a structure such as dict 
or list, the templating
+engine is applied to all the values of the structure).
+
+That's why we expect two things in case of ``template fields``:
+
+* with a few exceptions, only self.field = field should be happening in the 
operator's constructor
+* validation of the fields should be done in the ``execute`` method, not in 
the constructor because in
+  the constructor, the field value might be a templated value, not the final 
value.
+
+The exceptions are cases where we want to assign empty default value to a 
mutable field (list or dict)
+or when we have a more complex structure which we want to convert into a 
different format (say dict or list)
+but where we want to keep the original strings in the converted structure.
+
+In such cases we can usually do something like this
+
+.. code-block:: python
+
+    def __init__(self, *, my_field: list[str] = None, **kwargs):
+        super().__init__(**kwargs)
+        my_field = my_field or []
+        self.my_field = my_field
+
+The reason for doing it is that we are working on a cleaning up our code to 
have
+`pre-commit hook 
<../scripts/ci/pre_commit/pre_commit_validate_operators_init.py>`__
+that will make sure all the cases where logic (such as validation and complex 
conversion)
+is not done in the constructor are detected in PRs.
+
 -----------
 
 If you want to learn what are the options for your development environment, 
follow to the

Reply via email to