[GitHub] [airflow] mjpieters opened a new issue #11893: Smart sensor contexts appear confusing, redundant and incomplete

GitBox Tue, 27 Oct 2020 09:46:27 -0700


mjpieters opened a new issue #11893:
URL: https://github.com/apache/airflow/issues/11893



   I was looking into smart sensors, to see if our sensors could be adapted to 
their use.
   
   Unfortunately, the current implementation has some issues, specifically 
around the execution and poke contexts:
   
   - The term _execution context_ is used differently from what I expected. 
It's a number of sensor attributes that are needed to do proper housekeeping 
around the sensor lifetime, transferred from the original sensor operator to 
the `SensorInstance` entry so you don't actually have to load the original 
sensor class each time. But the `BaseOperator.execute()` method is also passed 
a context, created from `TaskInstance.get_template_context()`, which is passed 
on verbatim to the `BaseSensorOperator.poke()` method. The term used implies 
that the smart sensor execution context would be the same context, but it is 
not.
   
   - The term _poke context_ is confusing because it is used for two purposes: 
to create an instance of the sensor class inside of the smart sensor, **and** 
to pass to the `.poke()` method when executing the wrapped sensor.
   
   The latter makes it quite hard to repurpose an existing sensor, one that 
uses elements from the standard execution context: the object produced by 
`TaskInstance.get_template_context()`. If I were to add things like, say, 
`execution_date` to `poke_context_fields`, I also have to update my sensor 
class `__init__` method to ignore those extra parameters.
   
   Moreover, it makes _no logical sense_ to pass in the same mapping to both 
construct the instance, and to the poke method? The initialiser can just set 
the right values on `self`, and poke(...) can just access anything pertinent 
directly from there.
   
   To me, the execution context is really the _smart sensor context_, and there 
really needs to be a separate _operator  context_, that differs from the _poke 
context_. The operator context is what you use to create the sensor task in a 
DAG, and the poke context captures the dag-run specific information. The _smart 
sensor_ and _operator_ contexts could be sourced together (it doesn't really 
matter to the _operator_ that the smart sensor needs to keep some of 
information separate for its own bookkeeping; just pick up the `poke_interval`, 
`retries`, etc. entries once you collected the context).
   
   And the poke context, as passed to the `poke` method, should include some 
standard entries. It's fine if what is 'standard' differs from the template 
context, as long as it is easy enough for the `poke()` implementation to alter 
how it gets its information based on what is present. E.g. pass in SensorWork's 
`dag_id`, `execution_date` and `try_number` fields when running under smart 
sensor management, and explicitly set a `smart_sensor` indicator in that 
context or as a separate argument to the `poke()` call.
   
   To give some context: several of my sensors need access to the current 
dag_run `execution_date`, two access XCOM variables via the `context['ti']` 
entry.
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] mjpieters opened a new issue #11893: Smart sensor contexts appear confusing, redundant and incomplete

Reply via email to