mjpieters opened a new issue #11893: URL: https://github.com/apache/airflow/issues/11893
I was looking into smart sensors, to see if our sensors could be adapted to their use. Unfortunately, the current implementation has some issues, specifically around the execution and poke contexts: - The term _execution context_ is used differently from what I expected. It's a number of sensor attributes that are needed to do proper housekeeping around the sensor lifetime, transferred from the original sensor operator to the `SensorInstance` entry so you don't actually have to load the original sensor class each time. But the `BaseOperator.execute()` method is also passed a context, created from `TaskInstance.get_template_context()`, which is passed on verbatim to the `BaseSensorOperator.poke()` method. The term used implies that the smart sensor execution context would be the same context, but it is not. - The term _poke context_ is confusing because it is used for two purposes: to create an instance of the sensor class inside of the smart sensor, **and** to pass to the `.poke()` method when executing the wrapped sensor. The latter makes it quite hard to repurpose an existing sensor, one that uses elements from the standard execution context: the object produced by `TaskInstance.get_template_context()`. If I were to add things like, say, `execution_date` to `poke_context_fields`, I also have to update my sensor class `__init__` method to ignore those extra parameters. Moreover, it makes _no logical sense_ to pass in the same mapping to both construct the instance, and to the poke method? The initialiser can just set the right values on `self`, and poke(...) can just access anything pertinent directly from there. To me, the execution context is really the _smart sensor context_, and there really needs to be a separate _operator context_, that differs from the _poke context_. The operator context is what you use to create the sensor task in a DAG, and the poke context captures the dag-run specific information. The _smart sensor_ and _operator_ contexts could be sourced together (it doesn't really matter to the _operator_ that the smart sensor needs to keep some of information separate for its own bookkeeping; just pick up the `poke_interval`, `retries`, etc. entries once you collected the context). And the poke context, as passed to the `poke` method, should include some standard entries. It's fine if what is 'standard' differs from the template context, as long as it is easy enough for the `poke()` implementation to alter how it gets its information based on what is present. E.g. pass in SensorWork's `dag_id`, `execution_date` and `try_number` fields when running under smart sensor management, and explicitly set a `smart_sensor` indicator in that context or as a separate argument to the `poke()` call. To give some context: several of my sensors need access to the current dag_run `execution_date`, two access XCOM variables via the `context['ti']` entry. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
