[GitHub] [airflow] hugochinchilla commented on a diff in pull request #28333: Allow a DAG to be scheduled when any related DataSet is updated

GitBox Tue, 13 Dec 2022 08:25:56 -0800


hugochinchilla commented on code in PR #28333:
URL: https://github.com/apache/airflow/pull/28333#discussion_r1047395177



##########
airflow/migrations/versions/0123_2_6_0_add_dataset_schedule_mode_to_dag_model.py:
##########
@@ -0,0 +1,45 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""add dataset schedule mode to DAG model
+
+Revision ID: 7afd6d4021a9
+Revises: 290244fb8b83
+Create Date: 2022-12-13 11:34:26.648465
+
+"""
+
+from __future__ import annotations
+
+import sqlalchemy as sa
+from alembic import op
+
+# revision identifiers, used by Alembic.
+revision = "7afd6d4021a9"
+down_revision = "290244fb8b83"
+branch_labels = None
+depends_on = None
+airflow_version = "2.6.0"
+
+
+def upgrade():
+    op.add_column("dag", sa.Column("run_on_any_dataset_changed", sa.Boolean(), 
nullable=False, default=False))

Review Comment:
   I understand your concern but if we don't plan to implement more modes right 
now, we should do the most simple implementation that works.
   
   Because it's the most simple implementation it's simple enough to be easily 
changed in the future if we need it. 
   
   Otherwise you risk deciding on a model that may not fit the actual needs you 
may have in the future, it's always good to postpone this decisions until you 
have a better context and understanding of the problem you need to solve.
   
   For example, what if one of this future modes is something like "quorum"? 
where you want to execute any time you have updated at least 1/2 or 2/3 (or any 
fraction) of the datasets. Where would you store this number? It's better to 
postpone this kinds of decisions until you have the full context of what you're 
trying to accomplish.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] hugochinchilla commented on a diff in pull request #28333: Allow a DAG to be scheduled when any related DataSet is updated

Reply via email to