[GitHub] [airflow-site] jedcunningham commented on a diff in pull request #659: Add posts about 2.4 release

GitBox Fri, 16 Sep 2022 08:38:03 -0700


jedcunningham commented on code in PR #659:
URL: https://github.com/apache/airflow-site/pull/659#discussion_r973146899



##########
landing-pages/site/content/en/blog/airflow-2.4.0/index.md:
##########
@@ -0,0 +1,145 @@
+---
+title: "Apache Airflow 2.4.0: Data
+linkTitle: "Apache Airflow 2.4.0"
+author: "Ash Berlin-Taylor"
+github: "ashberlin"
+linkedin: "ashberlin-taylor"
+description: "We're proud to announce that Apache Airflow 2.4.0 has been 
released."
+tags: [Release]
+date: "2022-09-19"
+---
+
+Apache Airflow
+Apache Airflow 2.4.0 contains over 650 "user-facing" commits in this release 
(excluding commits to providers or chart) and over 870 in total since 2.3.0 and 
includes 50 new features, 99 improvements, 85 bug fixes, and several doc 
changes.
+
+**Details**:
+
+📦 PyPI: https://pypi.org/project/apache-airflow/2.4.0/ \
+📚 Docs: https://airflow.apache.org/docs/apache-airflow/2.4.0/ \
+🛠️ Release Notes: 
https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html \
+🐳 Docker Image: docker pull apache/airflow:2.4.0 \
+🚏 Constraints: https://github.com/apache/airflow/tree/constraints-2.4.0
+
+## Data-aware scheduling (AIP-48)
+
+This one is big. Airflow now has the ability to schedule DAGs based on other 
tasks updating datasets.
+
+What does this mean, exactly? This is a great new feature and lets DAG authors 
create smaller, more self-contained DAGs, that can chain together into a larger 
data-based workflow. If you are currently using `ExternalTaskSensor` or 
`TriggerDagRunOperator` you should take a look at datasets -- in most cases you 
can replace them with something that will speed up the scheduling!
+
+But enough talking, lets have a short example. First lets write a simple DAG 
with a task called `my_task` that produces a dataset called `my-dataset`:
+
+```python
+from airflow import Dataset
+
+
+dataset = Dataset(uri='my-dataset')
+
+with DAG(dag_id='producer', ...)
+    @task(outlets=[dataset])
+    def my_task():
+        ...
+```
+
+And then we can tell Airflow to schedule a DAG whenever this Dataset changes:
+
+```python
+
+from airflow import Dataset
+
+
+dataset = Dataset(uri='my-dataset')
+
+with DAG(dag_id='dataset-consumer', schedule=[dataset]):
+    ...
+```
+
+With these two DAGs, the instant `my_task` finishes, Airflow will create the 
DAG run for the `dataset-consumer` workflow.
+
+If you have the producer and consumer DAGs in different files you do not need 
to use the same Dataset object, two `Dataset()`s created with the same URI are 
equal.

Review Comment:
   This feels a little abrupt and out of place right here? It's in the docs, 
might not even need it in this post.



##########
landing-pages/site/content/en/blog/airflow-2.4.0/index.md:
##########
@@ -0,0 +1,145 @@
+---
+title: "Apache Airflow 2.4.0: Data
+linkTitle: "Apache Airflow 2.4.0"
+author: "Ash Berlin-Taylor"
+github: "ashberlin"
+linkedin: "ashberlin-taylor"
+description: "We're proud to announce that Apache Airflow 2.4.0 has been 
released."
+tags: [Release]
+date: "2022-09-19"
+---
+
+Apache Airflow

Review Comment:
   ```suggestion
   ```



##########
landing-pages/site/content/en/blog/airflow-2.4.0/index.md:
##########
@@ -0,0 +1,140 @@
+---
+title: "Apache Airflow 2.4.0: Data
+linkTitle: "Apache Airflow 2.4.0"
+author: "Ash Berlin-Taylor"
+github: "ashberlin"
+linkedin: "ashberlin-taylor"
+description: "We're proud to announce that Apache Airflow 2.4.0 has been 
released."
+tags: [Release]
+date: "2022-09-19"
+---
+
+Apache Airflow
+Apache Airflow 2.4.0 contains over 650 "user-facing" commits in this release 
(excluding commits to providers or chart) and over 870 in total since 2.3.0 and 
includes 50 new features, 99 improvements, 85 bug fixes, and several doc 
changes.

Review Comment:
   features: 44
   improvements: 39
   bugs: 52
   misc: 53
   docs: 19
   
   



##########
landing-pages/site/content/en/blog/airflow-2.4.0/index.md:
##########
@@ -0,0 +1,145 @@
+---
+title: "Apache Airflow 2.4.0: Data
+linkTitle: "Apache Airflow 2.4.0"
+author: "Ash Berlin-Taylor"
+github: "ashberlin"
+linkedin: "ashberlin-taylor"
+description: "We're proud to announce that Apache Airflow 2.4.0 has been 
released."
+tags: [Release]
+date: "2022-09-19"
+---
+
+Apache Airflow
+Apache Airflow 2.4.0 contains over 650 "user-facing" commits in this release 
(excluding commits to providers or chart) and over 870 in total since 2.3.0 and 
includes 50 new features, 99 improvements, 85 bug fixes, and several doc 
changes.
+
+**Details**:
+
+📦 PyPI: https://pypi.org/project/apache-airflow/2.4.0/ \
+📚 Docs: https://airflow.apache.org/docs/apache-airflow/2.4.0/ \
+🛠️ Release Notes: 
https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html \
+🐳 Docker Image: docker pull apache/airflow:2.4.0 \
+🚏 Constraints: https://github.com/apache/airflow/tree/constraints-2.4.0
+
+## Data-aware scheduling (AIP-48)
+
+This one is big. Airflow now has the ability to schedule DAGs based on other 
tasks updating datasets.
+
+What does this mean, exactly? This is a great new feature and lets DAG authors 
create smaller, more self-contained DAGs, that can chain together into a larger 
data-based workflow. If you are currently using `ExternalTaskSensor` or 
`TriggerDagRunOperator` you should take a look at datasets -- in most cases you 
can replace them with something that will speed up the scheduling!
+
+But enough talking, lets have a short example. First lets write a simple DAG 
with a task called `my_task` that produces a dataset called `my-dataset`:
+
+```python
+from airflow import Dataset
+
+
+dataset = Dataset(uri='my-dataset')
+
+with DAG(dag_id='producer', ...)
+    @task(outlets=[dataset])
+    def my_task():
+        ...
+```
+
+And then we can tell Airflow to schedule a DAG whenever this Dataset changes:
+
+```python
+
+from airflow import Dataset
+
+
+dataset = Dataset(uri='my-dataset')
+
+with DAG(dag_id='dataset-consumer', schedule=[dataset]):
+    ...
+```
+
+With these two DAGs, the instant `my_task` finishes, Airflow will create the 
DAG run for the `dataset-consumer` workflow.
+
+If you have the producer and consumer DAGs in different files you do not need 
to use the same Dataset object, two `Dataset()`s created with the same URI are 
equal.
+
+We know that what exists right now won't fit all use cases that people might 
wish for datasets, and in the coming minor releases (2.5, 2.6, etc.) we will 
expand and improve upon this foundation.
+
+Datasets represent the abstract concept of a dataset, and (for now) do not 
have any direct read or write capability - in this release we are adding the 
foundational feature that we will build upon in the future - and it's part of 
our goal to have smaller releases to get new features in your hands sooner!
+
+For more information on datasets, see the [documentation on Data-aware 
scheduling][data-aware-scheduling]. That includes details on how datasets are 
identified (URIs), how you can depend on multiple datasets, and how to think 
about what a dataset is (hint: don't include "date partitions" in a dataset, 
it's higher level than that).
+
+[data-aware-scheduling]: 
https://airflow.apache.org/docs/apache-airflow/stable/concepts/datasets.html
+
+## More improvments to Dynamic Task Mapping (AIP-42)
+
+You asked, we listened. Dynamic task mapping now includes support for:
+
+- `expand_kwargs`: To assign multiple parameters to a non-TaskFlow operator.
+- `zip`: To combine multiple things without cross-product.
+- `map`: To transform the parameters just before the task is run.
+
+For more information on dynamic task mapping, see the new sections of the doc 
on [Transforming Mapped Data][transforming-mapped-data], [Combining upstream 
data (aka "zipping")][task-mapping-zip], and [Assigning multiple parameters to 
a non-TaskFlow operator][expand-kwargs].
+
+[task-mapping-zip]: 
https://airflow.apache.org/docs/apache-airflow/stable/concepts/dynamic-task-mapping.html#combining-upstream-data-aka-zipping

Review Comment:
   ```suggestion
   [task-mapping-zip]: 
https://airflow.apache.org/docs/apache-airflow/2.4.0/concepts/dynamic-task-mapping.html#combining-upstream-data-aka-zipping
   ```
   
   I wonder, should we go to the specific version to ensure the deep links work 
in the future?



##########
landing-pages/site/content/en/blog/airflow-2.4.0/index.md:
##########
@@ -0,0 +1,145 @@
+---
+title: "Apache Airflow 2.4.0: Data
+linkTitle: "Apache Airflow 2.4.0"
+author: "Ash Berlin-Taylor"
+github: "ashberlin"
+linkedin: "ashberlin-taylor"
+description: "We're proud to announce that Apache Airflow 2.4.0 has been 
released."
+tags: [Release]
+date: "2022-09-19"
+---
+
+Apache Airflow
+Apache Airflow 2.4.0 contains over 650 "user-facing" commits in this release 
(excluding commits to providers or chart) and over 870 in total since 2.3.0 and 
includes 50 new features, 99 improvements, 85 bug fixes, and several doc 
changes.
+
+**Details**:
+
+📦 PyPI: https://pypi.org/project/apache-airflow/2.4.0/ \
+📚 Docs: https://airflow.apache.org/docs/apache-airflow/2.4.0/ \
+🛠️ Release Notes: 
https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html \
+🐳 Docker Image: docker pull apache/airflow:2.4.0 \
+🚏 Constraints: https://github.com/apache/airflow/tree/constraints-2.4.0
+
+## Data-aware scheduling (AIP-48)
+
+This one is big. Airflow now has the ability to schedule DAGs based on other 
tasks updating datasets.
+
+What does this mean, exactly? This is a great new feature and lets DAG authors 
create smaller, more self-contained DAGs, that can chain together into a larger 
data-based workflow. If you are currently using `ExternalTaskSensor` or 
`TriggerDagRunOperator` you should take a look at datasets -- in most cases you 
can replace them with something that will speed up the scheduling!
+
+But enough talking, lets have a short example. First lets write a simple DAG 
with a task called `my_task` that produces a dataset called `my-dataset`:
+
+```python
+from airflow import Dataset
+
+
+dataset = Dataset(uri='my-dataset')
+
+with DAG(dag_id='producer', ...)
+    @task(outlets=[dataset])
+    def my_task():
+        ...
+```
+
+And then we can tell Airflow to schedule a DAG whenever this Dataset changes:
+
+```python
+
+from airflow import Dataset
+
+
+dataset = Dataset(uri='my-dataset')
+
+with DAG(dag_id='dataset-consumer', schedule=[dataset]):
+    ...
+```
+
+With these two DAGs, the instant `my_task` finishes, Airflow will create the 
DAG run for the `dataset-consumer` workflow.
+
+If you have the producer and consumer DAGs in different files you do not need 
to use the same Dataset object, two `Dataset()`s created with the same URI are 
equal.
+
+We know that what exists right now won't fit all use cases that people might 
wish for datasets, and in the coming minor releases (2.5, 2.6, etc.) we will 
expand and improve upon this foundation.
+
+Datasets represent the abstract concept of a dataset, and (for now) do not 
have any direct read or write capability - in this release we are adding the 
foundational feature that we will build upon in the future - and it's part of 
our goal to have smaller releases to get new features in your hands sooner!
+
+For more information on datasets, see the [documentation on Data-aware 
scheduling][data-aware-scheduling]. That includes details on how datasets are 
identified (URIs), how you can depend on multiple datasets, and how to think 
about what a dataset is (hint: don't include "date partitions" in a dataset, 
it's higher level than that).
+
+[data-aware-scheduling]: 
https://airflow.apache.org/docs/apache-airflow/stable/concepts/datasets.html
+
+## More improvments to Dynamic Task Mapping (AIP-42)
+
+You asked, we listened. Dynamic task mapping now includes support for:
+
+- `expand_kwargs`: To assign multiple parameters to a non-TaskFlow operator.
+- `zip`: To combine multiple things without cross-product.
+- `map`: To transform the parameters just before the task is run.
+
+For more information on dynamic task mapping, see the new sections of the doc 
on [Transforming Mapped Data][transforming-mapped-data], [Combining upstream 
data (aka "zipping")][task-mapping-zip], and [Assigning multiple parameters to 
a non-TaskFlow operator][expand-kwargs].
+
+[task-mapping-zip]: 
https://airflow.apache.org/docs/apache-airflow/stable/concepts/dynamic-task-mapping.html#combining-upstream-data-aka-zipping
+[transforming-mapped-data]: 
https://airflow.apache.org/docs/apache-airflow/stable/concepts/dynamic-task-mapping.html#transforming-mapped-data
+[expand-kwargs]: 
https://airflow.apache.org/docs/apache-airflow/stable/concepts/dynamic-task-mapping.html#assigning-multiple-parameters-to-a-non-taskflow-operator
+
+## Auto-register DAGs used in a context manager (no more `as dag:` needed)
+
+This one is a small quality of life improvement, and I don't want to admit how 
many times I forgot the `as dag:`, or worse, had `as dag:` repeated.
+
+```python
+
+with DAG(dag_id="example") as dag:
+  ...
+
+
+@dag
+def dag_maker():
+    ...
+
+
+dag2 = dag_maker()
+```
+
+can become
+
+```python
+
+with DAG(dag_id="example"):
+    ...
+
+
+@dag
+def my_dag():
+    ...
+
+
+my_dag()
+```
+
+If you want to disable the behaviour for any reason, set `auto_register=False` 
on the DAG:
+
+```python
+# This dag will not be picked up by Airflow as it's not assigned to a variable
+with DAG(dag_id="example", auto_register=False):
+    ...
+```
+
+## Removal of experimental Smart Sensors feature
+
+Smart Sensors were added in Airflow 2.0 and are were deprecated starting from 
Airflow 2.2 in favor of Deferrable operators. If you are using smart sensors, 
you will have to switch to using deferrable operators before you can upgrade to 
Airflow 2.4.
+
+We're sorry to remove this feature (we didn't do it lightly) but to enable us 
to continue to grow and evolve Airflow we needed to remove this experimental 
code. We will only do this sort of change in a minor release for features 
marked as experimental. Any feature that is fully supported will only ever be 
removed in a major release.
+
+## Additional improvements
+
+With over 650 commits the full list of features, fixes and changes is too big 
to go in to here (check out the release notes for a full list), but some 
noteworthy or interesting small features include:

Review Comment:
   Let's link out to the release notes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow-site] jedcunningham commented on a diff in pull request #659: Add posts about 2.4 release

Reply via email to