This is an automated email from the ASF dual-hosted git repository. ephraimanierobi pushed a commit to branch v2-6-test in repository https://gitbox.apache.org/repos/asf/airflow.git
commit 66e74c3a188bd76a335f9d948c1dd37ef35e2975 Author: Ephraim Anierobi <[email protected]> AuthorDate: Fri Apr 14 14:05:24 2023 +0100 Add release notes --- RELEASE_NOTES.rst | 261 ++++++++++++++++++++++++++++++++++++ dev/airflow-github | 10 +- newsfragments/28172.misc.rst | 1 - newsfragments/28538.misc.rst | 1 - newsfragments/28892.improvement.rst | 1 - newsfragments/29506.significant.rst | 6 - newsfragments/29933.improvement.rst | 1 - newsfragments/30076.significant.rst | 3 - newsfragments/30152.significant.rst | 6 - newsfragments/30374.significant.rst | 5 - newsfragments/30375.significant.rst | 9 -- 11 files changed, 270 insertions(+), 34 deletions(-) diff --git a/RELEASE_NOTES.rst b/RELEASE_NOTES.rst index 6da321c799..0aeb3cf17e 100644 --- a/RELEASE_NOTES.rst +++ b/RELEASE_NOTES.rst @@ -21,6 +21,267 @@ .. towncrier release notes start +Airflow 2.6.0 (2023-04-20) +-------------------------- + +Significant Changes +^^^^^^^^^^^^^^^^^^^ + +Default permissions of file task handler log directories and files has been changed to "owner + group" writeable (#29506). +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + Default setting handles case where impersonation is needed and both users (airflow and the impersonated user) + have the same group set as main group. Previously the default was also other-writeable and the user might choose + to use the other-writeable setting if they wish by configuring ``file_task_handler_new_folder_permissions`` + and ``file_task_handler_new_file_permissions`` in ``logging`` section. + +SLA callbacks no longer add files to the dag processor manager's queue (#30076) +""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + This stops SLA callbacks from keeping the dag processor manager permanently busy. It means reduced CPU, + and fixes issues where SLAs stop the system from seeing changes to existing dag files. Additional metrics added to help track queue state. + +The ``cleanup()`` method in BaseTrigger is now defined as asynchronous (following async/await) pattern (#30152). +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + This is potentially a breaking change for any custom trigger implementations that override the ``cleanup()`` + method and uses synchronous code, however using synchronous operations in cleanup was technically wrong, + because the method was executed in the main loop of the Triggerer and it was introducing unnecessary delays + impacting other triggers. The change is unlikely to affect any existing trigger implementations. + +The gauge ``scheduler.tasks.running`` no longer exist (#30374) +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + The gauge has never been working and its value has always been 0. Having an accurate + value for this metric is complex so it has been decided that removing this gauge makes + more sense than fixing it with no certainty of the correctness of its value. + +Consolidate handling of tasks stuck in queued under new ``task_queued_timeout`` config (#30375) +""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + Logic for handling tasks stuck in the queued state has been consolidated, and the all configurations + responsible for timing out stuck queued tasks have been deprecated and merged into + ``[scheduler] task_queued_timeout``. The configurations that have been deprecated are + ``[kubernetes] worker_pods_pending_timeout``, ``[celery] stalled_task_timeout``, and + ``[celery] task_adoption_timeout``. If any of these configurations are set, the longest timeout will be + respected. For example, if ``[celery] stalled_task_timeout`` is 1200, and ``[scheduler] task_queued_timeout`` + is 600, Airflow will set ``[scheduler] task_queued_timeout`` to 1200. + +Improvement Changes +^^^^^^^^^^^^^^^^^^^ + +Display only the running configuration in configurations view (#28892) +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + The configurations view now only displays the running configuration. Previously, the default configuration + was displayed at the top but it was not obvious whether this default configuration was overridden or not. + Subsequently, the non-documented endpoint ``/configuration?raw=true`` is deprecated and will be removed in + Airflow 3.0. The HTTP response now returns an additional ``Deprecation`` header. The ``/config`` endpoint on + the REST API is the standard way to fetch Airflow configuration programmatically. + +Explicit skipped states list for ExternalTaskSensor (#29933) +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + ExternalTaskSensor now has an explicit ``skipped_states`` list + +Miscellaneous Changes +^^^^^^^^^^^^^^^^^^^^^ + +Handle OverflowError on exponential backoff in next_run_calculation (#28172) +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" + Maximum retry task delay is set to be 24h (86400s) by default. You can change it globally via ``core.max_task_retry_delay`` + parameter. + +Move Hive macros to the provider (#28538) +""""""""""""""""""""""""""""""""""""""""" + The Hive Macros (``hive.max_partition``, ``hive.closest_ds_partition``) are available only when Hive Provider is + installed. Please install Hive Provider > 5.1.0 when using those macros. + +New Features +^^^^^^^^^^^^ +- Add ``max_active_tis_per_dagrun`` for Dynamic Task Mapping (#29094) +- Add serializer for pandas dataframe (#30390) +- Deferrable ``TriggerDagRunOperator`` (#30292) +- Add command to get DAG Details via CLI (#30432) +- Adding ContinuousTimetable and support for @continuous schedule_interval (#29909) +- Allow customized rules to check if a file has dag (#30104) +- Add a new Airflow conf to specify a SSL ca cert for Kubernetes client (#30048) +- Bash sensor has an explicit retry code (#30080) +- Add filter task upstream/downstream to grid view (#29885) +- Add testing a connection via Airflow CLI (#29892) +- Support deleting the local log files when using remote logging (#29772) +- ``Blocklist`` to disable specific metric tags or metric names (#29881) +- Add a new graph inside of the grid view (#29413) +- Add database ``check_migrations`` config (#29714) +- add output format arg for ``cli.dags.trigger`` (#29224) +- Make json and yaml available in templates (#28930) +- Enable tagged metric names for existing Statsd metric publishing events | influxdb-statsd support (#29093) +- Add arg --yes to ``db export-archived`` command. (#29485) +- Make the policy functions pluggable (#28558) +- Add ``airflow db drop-archived`` command (#29309) +- Enable individual trigger logging (#27758) +- Implement new filtering options in graph view (#29226) +- Add triggers for ExternalTask (#29313) +- Add command to export purged records to CSV files (#29058) +- Add ``FileTrigger`` (#29265) +- Emit DataDog statsd metrics with metadata tags (#28961) +- add some statsd metrics for dataset (#28907) +- Add --overwrite option to ``connections import`` CLI command (#28738) +- Add general-purpose "notifier" concept to DAGs (#28569) +- add a new conf to wait past_deps before skipping a task (#27710) +- Add Flink on K8s Operator (#28512) +- Allow Users to disable SwaggerUI via configuration (#28354) +- Show mapped task groups in graph (#28392) +- Log FileTaskHandler to work with KubernetesExecutor's multi_namespace_mode (#28436) +- Add a new config for adapting masked secrets to make it easier to prevent secret leakage in logs (#28239) +- List specific config section and its values using the cli (#28334) +- KubernetesExecutor multi_namespace_mode can use namespace list to avoid requiring cluster role (#28047) +- Automatically save and allow restore of recent DAG run configs (#27805) +- Added exclude_microseconds to cli (#27640) + +Improvements +"""""""""""" +- Preload airflow imports before dag parsing to save time (#30495) +- Improve task & run actions ``UX`` in grid view (#30373) +- Speed up TaskGroups with caching property of group_id (#30284) +- Use the engine provided in the session (#29804) +- Type related import optimization for Executors (#30361) +- Add more type hints to the code base (#30503) +- some fixes to metrics doc (#30290) +- Always use self.appbuilder.get_session in security managers (#30233) +- Update SQLAlchemy ``select()`` to new style (#30515) +- Refactor out xcom constants from models (#30180) +- Add exception class name to DAG-parsing error message (#30105) +- Rename statsd_allow_list and statsd_block_list to ``metrics_*_list`` (#30174) +- Improve serialization of tuples and sets (#29019) +- Make cleanup method in trigger an async one (#30152) +- Lazy load serialization modules (#30094) +- SLA callbacks no longer add files to the dag_processing manager queue (#30076) +- Add task.trigger rule to grid_data (#30130) +- Speed up log template sync by avoiding ORM (#30119) +- Separate cli_parser.py into two modules (#29962) +- Explicit skipped states list for ExternalTaskSensor (#29933) +- Add task state hover highlighting to new graph (#30100) +- Store grid tabs in url params (#29904) +- Use custom Connexion resolver to load lazily (#29992) +- Delay Kubernetes import in secret masker (#29993) +- Delay ConnectionModelView init until it's accessed (#29946) +- Scheduler, make stale DAG deactivation threshold configurable instead of using dag processing timeout (#29446) +- Improve grid view height calculations (#29563) +- Avoid importing executor during conf validation (#29569) +- Make permissions for FileTaskHandler group-writeable and configurable (#29506) +- Add colors in help outputs of Airflow CLI commands #28789 (#29116) +- Add a param for get_dags endpoint to list only unpaused dags (#28713) +- Expose updated_at filter for dag run and task instance endpoints (#28636) +- Increase length of user identifier columns (#29061) +- Update gantt chart UI to display queued state of tasks (#28686) +- Add index on log.dttm (#28944) +- Display only the running configuration in configurations view (#28892) +- css, cap dropdown menu size dynamically (#28736) +- added JSON linter to connection edit / add UI for field extra. On connection edit screen, existing extra data will be displayed indented (#28583) +- Use labels instead of pod name for pod log read in k8s exec (#28546) +- Use time not tries for queued & running re-checks. (#28586) +- CustomTTYColoredFormatter should inherit TimezoneAware formatter (#28439) +- Improve past depends handling in Airflow CLI tasks.run command (#28113) +- Support using a list of callbacks in ``on_*_callback/sla_miss_callbacks`` (#28469) +- Better table name validation for db clean (#28246) +- Use object instead of array in config.yml for config template (#28417) +- Add markdown rendering for task notes. (#28245) +- Show mapped task groups in grid view (#28208) +- Add ``renamed`` and ``previous_name`` in config sections (#28324) +- Speed up most Users/Role CLI commands (#28259) +- Speed up Airflow role list command (#28244) +- Refactor serialization (#28067) +- Allow longer pod names for k8s executor / KPO (#27736) +- Updates health check endpoint to include ``triggerer`` status (#27755) + + +Bug Fixes +""""""""" +- Simplify logic to resolve tasks stuck in queued despite stalled_task_timeout (#30375) +- When clearing task instances try to get associated DAGs from database (#29065) +- Fix mapped tasks partial arguments when DAG default args are provided (#29913) +- Deactivate DAGs deleted from within zip files (#30608) +- Recover from ``too old resource version exception`` by retrieving the latest ``resource_version`` (#30425) +- Fix possible race condition when refreshing DAGs (#30392) +- Use custom validator for OpenAPI request body (#30596) +- Fix ``TriggerDagRunOperator`` with deferrable parameter (#30406) +- Speed up dag runs deletion (#30330) +- Do not use template literals to construct html elements (#30447) +- Fix deprecation warning in ``example_sensor_decorator`` DAG (#30513) +- Avoid logging sensitive information in triggerer job log (#30110) +- Add a new parameter for base sensor to catch the exceptions in poke method (#30293) +- Fix dag run conf encoding with non-JSON serializable values (#28777) +- Added fixes for Airflow to be usable on Windows Dask-Workers (#30249) +- fix: force DAG last modified time to UTC (#30243) +- Fix EmptySkipOperator in example dag (#30269) +- Make the webserver startup respect update_fab_perms (#30246) +- Ignore error when changing log folder permissions (#30123) +- Disable ordering DagRuns by note (#30043) +- try_number was not being passed to the get_task_log method, instead (#28817) +- Mask out non-access bits when comparing file modes (#29886) +- Remove Run task action from UI (#29706) +- Fix log tailing issues with legacy log view (#29496) +- Fixes to how DebugExecutor handles sensors (#28528) +- Ensure that pod_mutation_hook is called before logging the pod name (#28534) +- Handle OverflowError on exponential backoff in next_run_calculation (#28172) + +Misc/Internal +""""""""""""" +- Remove gauge ``scheduler.tasks.running`` (#30374) +- Bump json5 to 1.0.2 and eslint-plugin-import to 2.27.5 in ``/airflow/www`` (#30568) +- Add tests to PythonOperator (#30362) +- Add asgiref as a core dependency (#30527) +- Discovery safe mode toggle comment clarification (#30459) +- fix: upgrade moment-timezone package to fix Tehran tz (#30455) +- Bump loader-utils from 2.0.0 to 2.0.4 in ``/airflow/www`` (#30319) +- Bump babel-loader from 8.1.0 to 9.1.0 in ``/airflow/www`` (#30316) +- DagBag: Use ``dag.fileloc`` instead of ``dag.full_filepath`` in exception message (#30610) +- Change log level of serialization information (#30239) +- Minor DagRun helper method cleanup (#30092) +- Improve type hinting in stats.py (#30024) +- Limit ``importlib-metadata`` backport to < 5.0.0 (#29924) +- Align cncf provider file names with AIP-21 (#29905) +- Upgrade FAB to 4.3.0 (#29766) +- Clear ExecutorLoader cache in tests (#29849) +- Lazy load Task Instance logs in UI (#29827) +- added warning log for max page limit exceeding api calls (#29788) +- Aggressively cache entry points in process (#29625) +- Don't use ``importlib.metadata`` to get Version for speed (#29723) +- Upgrade Mypy to 1.0 (#29468) +- Rename ``db export-cleaned`` to ``db export-archived`` (#29450) +- listener: simplify API by replacing SQLAlchemy event-listening by direct calls (#29289) +- No multi-line log entry for bash env vars (#28881) +- Switch to ruff for faster static checks (#28893) +- Remove horizontal lines in TI logs (#28876) +- Make allowed_deserialization_classes more intuitive (#28829) +- Propagate logs to stdout when in k8s executor pod (#28440) +- Fix code readability, add docstrings to json_client (#28619) +- AIP-51 - Misc. Compatibility Checks (#28375) +- Fix is_local for LocalKubernetesExecutor (#28288) +- Move Hive macros to the provider (#28538) +- Rerun flaky PinotDB integration test (#28562) +- Add pre-commit hook to check session default value (#28007) +- Refactor get_mapped_group_summaries for web UI (#28374) +- Add support for k8s 1.26 (#28320) +- Replace ``freezegun`` with time-machine (#28193) +- Completed D400 for ``airflow/kubernetes/*`` (#28212) +- Completed D400 for multiple folders (#27969) +- Drop k8s 1.21 and 1.22 support (#28168) +- Remove unused task_queue attr from k8s scheduler class (#28049) +- Completed D400 for multiple folders (#27767, #27768) + + +Doc only changes +"""""""""""""""" +- docs: use correct import path for Dataset (#30617) +- Create ``audit_logs.rst`` (#30405) +- Adding taskflow API example for sensors (#30344) +- add clarification about timezone aware dags (#30467) +- Clarity params documentation (#30345) +- fix(metrics): fix unit for task duration metric (#30273) +- Update dag-run.rst for dead links of cli commands (#30254) +- Add Write efficient Python code section to Reducing DAG complexity (#30158) +- Allow to specify which connection, variable or config are being looked up in the backend using ``*_lookup_pattern`` parameters (#29580) +- Add Documentation for notification feature extension (#29191) +- Clarify that executor interface is public but instances are not (#29200) +- Add Public Interface description to Airflow documentation (#28300) +- Add documentation for task group mapping (#28001) + + Airflow 2.5.3 (2023-04-01) -------------------------- diff --git a/dev/airflow-github b/dev/airflow-github index a3453d56fb..d756f06fab 100755 --- a/dev/airflow-github +++ b/dev/airflow-github @@ -310,7 +310,15 @@ def changelog(previous_version, target_version, github_token): issue_type = get_issue_type(issue) files = files_touched(repo, commit["id"]) if is_core_commit(files): - sections[issue_type].append(commit["subject"]) + if issue_type in ["bug-fix", "doc-only", "misc/internal"]: + with open("../RELEASE_NOTES.rst") as file: + for line in file.readlines(): + if line.endswith(f"(#{commit['id']})"): + continue + else: + sections[issue_type].append(commit["subject"]) + else: + sections[issue_type].append(commit["subject"]) else: sections[DEFAULT_SECTION_NAME].append(commit["subject"]) diff --git a/newsfragments/28172.misc.rst b/newsfragments/28172.misc.rst deleted file mode 100644 index 8b47c9749c..0000000000 --- a/newsfragments/28172.misc.rst +++ /dev/null @@ -1 +0,0 @@ -Maximum retry task delay is set to be 24h (86400s) by default. You can change it globally via ``core.max_task_retry_delay`` parameter. diff --git a/newsfragments/28538.misc.rst b/newsfragments/28538.misc.rst deleted file mode 100644 index 5b929d8448..0000000000 --- a/newsfragments/28538.misc.rst +++ /dev/null @@ -1 +0,0 @@ -The Hive Macros (``hive.max_partition``, ``hive.closest_ds_partition``) are available only when Hive Provider is installed. Please install Hive Provider > 5.1.0 when using those macros. diff --git a/newsfragments/28892.improvement.rst b/newsfragments/28892.improvement.rst deleted file mode 100644 index ee27b97d5a..0000000000 --- a/newsfragments/28892.improvement.rst +++ /dev/null @@ -1 +0,0 @@ -The configurations view now only displays the running configuration. Previously, the default configuration was displayed at the top but it wasn't obvious whether this default configuration was overridden or not. Subsequently, the non-documented endpoint ``/configuration?raw=true`` is deprecated and will be removed in Airflow 3.0. The HTTP response now returns an additional ``Deprecation`` header. The ``/config`` endpoint on the REST API is the standard way to fetch Airflow configuration [...] diff --git a/newsfragments/29506.significant.rst b/newsfragments/29506.significant.rst deleted file mode 100644 index 1e0555c13a..0000000000 --- a/newsfragments/29506.significant.rst +++ /dev/null @@ -1,6 +0,0 @@ -Default permissions of file task handler log directories and files has been changed to "owner + group" writeable. - -Default setting handles case where impersonation is needed and both users (airflow and the impersonated user) -have the same group set as main group. Previously the default was also other-writeable and the user might choose -to use the other-writeable setting if they wish by configuring ``file_task_handler_new_folder_permissions`` -and ``file_task_handler_new_file_permissions`` in ``logging`` section. diff --git a/newsfragments/29933.improvement.rst b/newsfragments/29933.improvement.rst deleted file mode 100644 index fd3de0f713..0000000000 --- a/newsfragments/29933.improvement.rst +++ /dev/null @@ -1 +0,0 @@ -ExternalTaskSensor now has an explicit ``skipped_states`` list diff --git a/newsfragments/30076.significant.rst b/newsfragments/30076.significant.rst deleted file mode 100644 index 83805af118..0000000000 --- a/newsfragments/30076.significant.rst +++ /dev/null @@ -1,3 +0,0 @@ -SLA callbacks no longer add files to the dag processor manager's queue - -This stops SLA callbacks from keeping the dag processor manager permanently busy. It means reduced CPU, and fixes issues where SLAs stop the system from seeing changes to existing dag files. Additional metrics added to help track queue state. diff --git a/newsfragments/30152.significant.rst b/newsfragments/30152.significant.rst deleted file mode 100644 index 5b0325bbe8..0000000000 --- a/newsfragments/30152.significant.rst +++ /dev/null @@ -1,6 +0,0 @@ -The ``cleanup()`` method in BaseTrigger is now defined as asynchronous (following async/await) pattern. - -This is potentially a breaking change for any custom trigger implementations that override the ``cleanup()`` -method and uses synchronous code, however using synchronous operations in cleanup was technically wrong, -because the method was executed in the main loop of the Triggerer and it was introducing unnecessary delays -impacting other triggers. The change is unlikely to affect any existing trigger implementations. diff --git a/newsfragments/30374.significant.rst b/newsfragments/30374.significant.rst deleted file mode 100644 index d6c32cbdae..0000000000 --- a/newsfragments/30374.significant.rst +++ /dev/null @@ -1,5 +0,0 @@ -The gauge ``scheduler.tasks.running`` no longer exist - -The gauge has never been working and its value has always been 0. Having an accurate -value for this metric is complex so it has been decided that removing this gauge makes -more sense than fixing it with no certainty of the correctness of its value. diff --git a/newsfragments/30375.significant.rst b/newsfragments/30375.significant.rst deleted file mode 100644 index d2fd1a87f2..0000000000 --- a/newsfragments/30375.significant.rst +++ /dev/null @@ -1,9 +0,0 @@ -Consolidate handling of tasks stuck in queued under new ``task_queued_timeout`` config - -Logic for handling tasks stuck in the queued state has been consolidated, and the all configurations -responsible for timing out stuck queued tasks have been deprecated and merged into -``[scheduler] task_queued_timeout``. The configurations that have been deprecated are -``[kubernetes] worker_pods_pending_timeout``, ``[celery] stalled_task_timeout``, and -``[celery] task_adoption_timeout``. If any of these configurations are set, the longest timeout will be -respected. For example, if ``[celery] stalled_task_timeout`` is 1200, and ``[scheduler] task_queued_timeout`` -is 600, Airflow will set ``[scheduler] task_queued_timeout`` to 1200.
