The primary difference between those cases and the other Sensors is that the
sensors I've seen (EMR Job Flow, S3 Key) don't do anything _other_ than the
sensing task, where as the tasks you linked to also perform some other action;
it's just that they wait until that operation is complete before returning.
Additionally my understanding is that there Sensor's are just a API/python
class-level convention that don't make any difference to the scheduler, i.e.
this is what the BaseSensor class does:
def execute(self, context):
started_at = datetime.now()
while not self.poke(context):
if (datetime.now() - started_at).total_seconds() > self.timeout:
if self.soft_fail:
raise AirflowSkipException('Snap. Time is OUT.')
else:
raise AirflowSensorTimeout('Snap. Time is OUT.')
sleep(self.poke_interval)
logging.info("Success criteria met. Exiting.")
i.e. there's not much difference in effect from an operator that loops and
sleeps itself to one which is a Sensor.
-ash
> On 5 Sep 2017, at 16:14, Richard Baron Penman <[email protected]> wrote:
>
> Hello,
>
> I noticed some operators in contrib (ECS, databricks, dataproc) submit
> their task and then poll until complete:
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/ecs_operator.py
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/databricks_operator.py
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/dataproc_operator.py
>
> Would they be better designed as Sensors?
>
> I ask because I wrote a Sensor for an API and wondering whether there was
> an advantage to the Operator polling approach.
>
> Richard