zhongjiajie edited a comment on issue #4779: [AIRFLOW-3958] Support list tasks 
as upstream in chain
URL: https://github.com/apache/airflow/pull/4779#issuecomment-467694805
 
 
   @feluelle I don't think we should remove `chain` function. And we should and 
this featrue to master branch.
   
   I think function in `helper` mean **help you, and make it easy**. Image some 
situation, we should  make couple of similar task in ONLY upstream and 
downstream dependent. We could use code like
   
   ```py
   t1 = DummyOperator(task_id='t1', dag=dag)
   t2 = DummyOperator(task_id='t2', dag=dag)
   ...
   tn = DummyOperator(task_id='tn', dag=dag)
   t1 >> t2 >> ... >> tn
   ```
   or we just using function `chain`
   ```py
   list_for_task = [DummyOperator(task_id='t{}'.format(i), dag=dag) for i in 
range(1, n + 1)]
   chain(*list_for_task)
   ```
   I prefer the second way rather than the first one.
   
   One classic situation in my daily use is in data house, I want to transfer 
couple of Hive table and LOAD to different output system. I do like
   ```py
   prepare_dw_table_for_ana_system = [
       'prepare_step_1',
       'prepare_step_2',
       'prepare_step_3',
       'prepare_step_4',
   ]
   tasks_prepare_dw_table_for_ana_system = [
       HiveOperator(
           task='prepare_dw_table_for_ana_system_{}'.format(step),
           sql='{}.sql'.format(step),
           dag=dag
       ) for step in prepare_dw_table_for_ana_system
   ]
   
   diff_system_adapter = [
       'system_1',
       'system_2',
       'system_3',
       'system_4',
   ]
   tasks_diff_system_adapter = [
       HiveOperator(
           task='diff_system_adapter_{}'.format(system),
           sql='{}.sql'.format(system),
           dag=dag
       ) for system in diff_system_adapter
   ]
   
   post_step = [DummyOperator(task_id='post_step_{}'.format(i), dag=dag) for i 
in range(3)]
   
   chain(*tasks_prepare_dw_table_for_ana_system, tasks_diff_system_adapter, 
*post_step)
   ```
   I think make task in list could make the job more significative, because 
sometime we use muliti step to do one thing, and I want I refactor DAG just 
have to know what group tasks `tasks_prepare_dw_table_for_ana_system` mean, 
rather than each single task like `prepare_dw_table_for_ana_system_xxx` mean
   
   So, I don't think we should remove function `chain`. And we should let 
`chain(t1, t2, [t3, t4, t5], t6)` work in `chain`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to