1. Most organization have all their Airflow pipelines in a git repo and sync it to all Airflow boxes 2. Most likely you'll want to use PythonOperators that leverages hooks (maybe the Python requests lib along with MySqlHook). XCom wasn't designed to move data around, just metadata. It's important to have atomic tasks that are idempotent, so you want to preemptive delete and the whole E-T and -L within a task whenever possible. 3. If you end up writing lots of PythonOperators that look alike, you may want to create an operator. This operator can live in your DAGs repo, or be contributed back if you think it's generic enough for the rest of the community to use. 4. It's most common to run DAGs on a schedule, but you can also trigger them on-demand, either through the CLI, the Web UI (create a DagRun), or using a TriggerDagRunOperator (from another DAG)
Max On Sun, Jun 12, 2016 at 2:35 PM, diwakar bhardwaj <[email protected] > wrote: > I am a newbie in airflow but looks like airflow would be a perfect choice > for workflow management needs for my organization. I have been through the > documentation but still have some questions about it. It would be really > great, if I can get any tutorial/links that can help me answer these > questions. > > 1. How to integrate with git packages, I've read that we can have a git > project of the dag directory. Are there any other scheme through which > we > can integrate git with airflow ? > 2. Right now most of my data sources are available through web api's, > hence I would need an operator that'll fetch a single row of data > through a > REST call and after processing it puts it into MySQL. Should I make a > separate operator for this whole operation ? or, should I use > HTTPOperator > and MySQLOperator communicating with XCom ? > 3. If I go ahead with making a separate operator, can I plug in a > non-python executable to do this job ? > 4. Can we trigger a DAG from web interface ? > > > -- > Ciao > Diwakar >
