uranusjr commented on code in PR #25410:
URL: https://github.com/apache/airflow/pull/25410#discussion_r941040515
##########
airflow/models/dag.py:
##########
@@ -236,11 +237,16 @@ class DAG(LoggingMixin):
:param dag_id: The id of the DAG; must consist exclusively of alphanumeric
characters, dashes, dots and underscores (all ASCII)
:param description: The description for the DAG to e.g. be shown on the
webserver
+ :param schedule: Defines the rules according to which DAG runs are
scheduled. Can
+ accept cron string, timedelta object, Timetable, or list of Dataset
objects.
+ See also :doc:`/howto/timetable`.
:param schedule_interval: Defines how often that DAG runs, this
timedelta object gets added to your latest task instance's
- execution_date to figure out the next schedule
+ execution_date to figure out the next schedule.
+ Note: deprecated in Airflow 2.4; use `schedule` instead.
:param timetable: Specify which timetable to use (in which case
schedule_interval
must not be set). See :doc:`/howto/timetable` for more information
+ Note: deprecated in Airflow 2.4; use `schedule` instead.
Review Comment:
Should we just remove these entirely from the docstring? New users don’t
need to know about them, and those who are using them don’t need to read the
docs on them anyway.
##########
airflow/models/dag.py:
##########
@@ -313,7 +319,7 @@ class DAG(LoggingMixin):
to render templates as native Python types. If False, a Jinja
``Environment`` is used to render templates as string values.
:param tags: List of tags to help filtering DAGs in the UI.
- :param schedule_on: List of upstream datasets if for use in triggering DAG
runs.
+ :param schedule: List of upstream datasets if for use in triggering DAG
runs.
Review Comment:
Should be removed entirely.
##########
airflow/models/dag.py:
##########
@@ -460,18 +467,46 @@ def __init__(
self.default_args['end_date'] =
timezone.convert_to_utc(self.default_args['end_date'])
# sort out DAG's scheduling behavior
- scheduling_args = [schedule_interval, timetable, schedule_on]
+ scheduling_args = [schedule_interval, timetable, schedule]
if not at_most_one(*scheduling_args):
- raise ValueError(
- "At most one allowed for args 'schedule_interval',
'timetable', and 'schedule_on'."
+ raise ValueError("At most one allowed for args
'schedule_interval', 'timetable', and 'schedule'.")
+ if schedule_interval is not NOTSET:
+ warnings.warn(
+ "Param `schedule_interval` is deprecated and will be removed
in a future release. "
+ "Please use `schedule` instead. ",
+ DeprecationWarning,
+ stacklevel=2,
+ )
+ if timetable is not None:
+ warnings.warn(
+ "Param `timetable` is deprecated and will be removed in a
future release. "
+ "Please use `schedule` instead. ",
+ DeprecationWarning,
+ stacklevel=2,
)
-
self.timetable: Timetable
self.schedule_interval: ScheduleInterval
- self.schedule_on: Optional[List["Dataset"]] = list(schedule_on) if
schedule_on else None
- if schedule_on:
- if not isinstance(schedule_on, Sequence):
- raise ValueError("Param `schedule_on` must be
Sequence[Dataset]")
+ self.dataset_triggers: Optional[List[Dataset]] = None
+
+ if schedule is not NOTSET:
+ if isinstance(schedule, List):
+ # if List, only support List[Dataset]
+ if any(isinstance(x, Dataset) for x in schedule):
+ if not all(isinstance(x, Dataset) for x in schedule):
+ raise ValueError(
+ "If scheduling DAG with List[Dataset], all
elements must be Dataset."
+ )
+ self.dataset_triggers = list(schedule)
+ else:
+ raise ValueError(
+ "Use of List object with `schedule` param is only
supported for List[Dataset]."
+ )
+ elif isinstance(schedule, Timetable):
+ timetable = schedule
+ else: # assumed to be ScheduleIntervalArg
+ schedule_interval = schedule
Review Comment:
```suggestion
if isinstance(schedule, List):
# if List, only support List[Dataset]
if any(isinstance(x, Dataset) for x in schedule):
if not all(isinstance(x, Dataset) for x in schedule):
raise ValueError(
"If scheduling DAG with List[Dataset], all elements
must be Dataset."
)
self.dataset_triggers = list(schedule)
else:
raise ValueError(
"Use of List object with `schedule` param is only
supported for List[Dataset]."
)
elif isinstance(schedule, Timetable):
timetable = schedule
elif schedule is not None:
schedule_interval = schedule
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]