The primary difference between those cases and the other Sensors is that the 
sensors I've seen (EMR Job Flow, S3 Key) don't do anything _other_ than the 
sensing task, where as the tasks you linked to also perform some other action; 
it's just that they wait until that operation is complete before returning.

Additionally my understanding is that there Sensor's are just a API/python 
class-level convention that don't make any difference to the scheduler, i.e. 
this is what the BaseSensor class does:


def execute(self, context):
  started_at = datetime.now()
  while not self.poke(context):
    if (datetime.now() - started_at).total_seconds() > self.timeout:
      if self.soft_fail:
        raise AirflowSkipException('Snap. Time is OUT.')
      else:
        raise AirflowSensorTimeout('Snap. Time is OUT.')
    sleep(self.poke_interval)
  logging.info("Success criteria met. Exiting.")

i.e. there's not much difference in effect from an operator that loops and 
sleeps itself to one which is a Sensor.

-ash

> On 5 Sep 2017, at 16:14, Richard Baron Penman <[email protected]> wrote:
> 
> Hello,
> 
> I noticed some operators in contrib (ECS, databricks, dataproc) submit
> their task and then poll until complete:
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/ecs_operator.py
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/databricks_operator.py
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/dataproc_operator.py
> 
> Would they be better designed as Sensors?
> 
> I ask because I wrote a Sensor for an API and wondering whether there was
> an advantage to the Operator polling approach.
> 
> Richard

Reply via email to