jscheffl commented on code in PR #37888:
URL: https://github.com/apache/airflow/pull/37888#discussion_r1513501456


##########
airflow/jobs/scheduler_job_runner.py:
##########
@@ -1284,9 +1284,17 @@ def _create_dag_runs_dataset_triggered(
                     events=dataset_events,
                 )
 
+                run_conf = {}
+                for item in dataset_events:
+                    event: DatasetEvent = item
+                    extra: dict | None = event.extra
+                    if extra:
+                        run_conf.update(extra)

Review Comment:
   The code was just an idea - not finally thought through. If you say "heavy 
handed"... what do you mean? too brute force and the user does not know what 
comes out? Or do you mean we need a better merging mechanism? Or some hooking 
to be able to inject a custom merging strategy? Or just "code is ugly" :-D 
   
   Background: I would assume that in 90% of cases a single DAG triggers a 
dataset. THeremight be cases where multiple events come together to trigger. In 
such case we need to merge `extra`s. I'd assume most times it is "conflict 
free" but you never know. Might be a feature to have it "last property wins" to 
collect events but otherwise if users feel there are too many conflicts, 
individual extras can also be produced "conflict free" with individual keys.
   
   `conf` vs. `params`:
   
   Yes, `params` and `dag_run.conf` somehow should be merged. I believe this is 
a leftover in the API from the past. CONF is the dict which is used to trigger 
a DAG. The conf is persisted as blob with the DagRun.
   During runtime the conf is available in the context as dict, representing 
1:1 the conf used to trigger. No validation. Just a dict.
   `params` in contrast have default values, conf is setting values on top and 
the result is JSON validated.
   Both `üarams`and `conf`are available in the context and can be used. I 
believre mid-term we should deprecate the usage of conf in the DAG and 
consolidate to the (more and better functional) params. But for today params 
only exist during runtime.
   @hussein-awala did an attempt here but it dd not make it to finish line: 
#29174 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to