bolkedebruin edited a comment on issue #5254: [AIRFLOW-4473] Add papermill 
operator
URL: https://github.com/apache/airflow/pull/5254#issuecomment-490224686
 
 
   Papermill is awesome! Consider the following dag:
   
   ```
   import airflow
   
   from airflow.models import DAG
   from airflow.operators.papermill_operator import PapermillOperator
   from airflow.operators.bash_operator import BashOperator
   
   from datetime import timedelta
   
   args = {
       'owner': 'airflow',
       'start_date': airflow.utils.dates.days_ago(2)
   }
   
   dag = DAG(
       dag_id='example_papermill_operator', default_args=args,
       schedule_interval='0 0 * * *',
       dagrun_timeout=timedelta(minutes=60))
   
   run_this = PapermillOperator(
       task_id="run_example_notebook",
       dag=dag,
       input_nb="/tmp/hello_world.ipynb",
       output_nb="/tmp/out-{{ execution_date }}.ipynb",
       parameters={"msgs": "Ran from Airflow at {{ execution_date }}!"}
   )
   
   if __name__ == "__main__":
       dag.cli()
   
   ```
   
   the simple notebook looks like this
   
   ```
   msgs = "Hello!" <-- parameterized cell
   
   print(msgs)
   ```
   Cheap reporting :-)
   
   BTW: you will also like this in the context of Amundsen. This operator auto 
generates lineage information. If you implement your own lineage client in 
Airflow you can integrate this with Neo4j/Elastic. Atlas is already supported 
;-) (Overall the lineage capability in AIrflow needs some love, the meta 
programming is just annoying... usage needs to guide it)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to