leahecole commented on a change in pull request #272:
URL: https://github.com/apache/airflow-site/pull/272#discussion_r460341387
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
Review comment:
soft preference for "according to" rather than "in line with"
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
Review comment:
nit - capitalize Airflow
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
Review comment:
Two clarifications that might be helpful to the reader here
1. What tells the scheduler that a task is ready?
2. How often does the scheduler run? (continuously? At a set interval? does
it depend on the users's environment? Can it be configured? These all might be
things a new Airflower is wondering)
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
Review comment:
nit - grammarwise you should spell out Directed Acyclic Graph, then put
the abbreviation in parentheses - you can then use the abbreviation throughout
the rest of the post :)
second nit - make sure there's a space after the word and before the opening
parenthesis
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
Review comment:
nit - capitalize Airflow
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
+monitor and troubleshoot DAGs and Tasks.
+
+There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
+including the duration of the task. The UI makes it possible to view the
task's dependencies in a
+tree view and graph view. You can view task logs in the UI.
+
+The web UI is started with the command `airflow webserver` in the breeze
environment.
+
+### Backend
+By default, Airflow uses the SQLite backend for storing the configuration
information, DAG states,
Review comment:
Tbh I am not familiar with this default - @potiuk is this just a breeze
default or is this an every time you use Airflow for the first time default?
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
Review comment:
also, link to SequentialExecutor documentation pretty please
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
Review comment:
Consider adding an image also of what it looks like in Airflow, perhaps
in graph view?
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
Review comment:
I think it should be "this" and not "these" in the sentence that begins
with "It accomplishes..."
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
Review comment:
If you're feeling like it, you could also consider linking to the
wikipedia article or another definition of DAG
https://en.wikipedia.org/wiki/Directed_acyclic_graph
But you definitely don't have to - users may just go search and end up there
anyway :)
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
Review comment:
Consider linking the words "breeze environment" to the Breeze.rst doc we
have
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
Review comment:
I think this paragraph might be helped by the addition of a section
about workers, then you can explain how the executor acts in relationship to
the workers
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
Review comment:
nit - capitalize Airflow
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
+monitor and troubleshoot DAGs and Tasks.
+
+There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
+including the duration of the task. The UI makes it possible to view the
task's dependencies in a
+tree view and graph view. You can view task logs in the UI.
+
+The web UI is started with the command `airflow webserver` in the breeze
environment.
+
+### Backend
+By default, Airflow uses the SQLite backend for storing the configuration
information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.
+
+You can use PostgreSQL or MySQL as a backend for airflow. It is easy to change
to PostgreSQL or MySQL.
+
+This command `./breeze --backend mysql` selects MySQL as the backend when
starting the breeze environment.
+
+### Operators
+Operators determine what gets done by a task. Airflow has a lot of builtin
Operators. Each operator
+does a specific thing. There's a BashOperator that executes a bash command,
the PythonOperator which
+calls a python function, AwsBatchOperator which executes a job on AWS Batch
and many more.
Review comment:
Might be worth linking the words "many more" to additional documentation
about operators so folks can peruse the operators we have from various
providers and other projects
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
Review comment:
Actually, it might be better to link to the executor docs here, and list
some of the others - that way a new Airflower can read further if they are in a
position where they need to chose an executor
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
+monitor and troubleshoot DAGs and Tasks.
+
+There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
+including the duration of the task. The UI makes it possible to view the
task's dependencies in a
+tree view and graph view. You can view task logs in the UI.
+
+The web UI is started with the command `airflow webserver` in the breeze
environment.
+
+### Backend
+By default, Airflow uses the SQLite backend for storing the configuration
information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.
+
+You can use PostgreSQL or MySQL as a backend for airflow. It is easy to change
to PostgreSQL or MySQL.
+
+This command `./breeze --backend mysql` selects MySQL as the backend when
starting the breeze environment.
+
+### Operators
+Operators determine what gets done by a task. Airflow has a lot of builtin
Operators. Each operator
+does a specific thing. There's a BashOperator that executes a bash command,
the PythonOperator which
Review comment:
instead of "thing" what about saying "task"? I think that'll tie in
nicely with your descriptions from above
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
+monitor and troubleshoot DAGs and Tasks.
+
+There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
+including the duration of the task. The UI makes it possible to view the
task's dependencies in a
+tree view and graph view. You can view task logs in the UI.
+
+The web UI is started with the command `airflow webserver` in the breeze
environment.
+
+### Backend
+By default, Airflow uses the SQLite backend for storing the configuration
information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.
+
+You can use PostgreSQL or MySQL as a backend for airflow. It is easy to change
to PostgreSQL or MySQL.
+
+This command `./breeze --backend mysql` selects MySQL as the backend when
starting the breeze environment.
+
+### Operators
+Operators determine what gets done by a task. Airflow has a lot of builtin
Operators. Each operator
+does a specific thing. There's a BashOperator that executes a bash command,
the PythonOperator which
+calls a python function, AwsBatchOperator which executes a job on AWS Batch
and many more.
+
+### Sensors
Review comment:
Organizational Nit - should this be ####, since a Sensor is a type of
Operator?
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
+monitor and troubleshoot DAGs and Tasks.
+
+There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
+including the duration of the task. The UI makes it possible to view the
task's dependencies in a
+tree view and graph view. You can view task logs in the UI.
+
+The web UI is started with the command `airflow webserver` in the breeze
environment.
+
+### Backend
+By default, Airflow uses the SQLite backend for storing the configuration
information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.
+
+You can use PostgreSQL or MySQL as a backend for airflow. It is easy to change
to PostgreSQL or MySQL.
+
+This command `./breeze --backend mysql` selects MySQL as the backend when
starting the breeze environment.
+
+### Operators
+Operators determine what gets done by a task. Airflow has a lot of builtin
Operators. Each operator
+does a specific thing. There's a BashOperator that executes a bash command,
the PythonOperator which
+calls a python function, AwsBatchOperator which executes a job on AWS Batch
and many more.
+
+### Sensors
+Sensors can be described as special operators that are used to monitor a
long-running task.
+Just like Operators, there are many predefined sensors in Airflow.
+
+### Breeze Environment
+The breeze environment is the development environment for Airflow where you
can run tests, build images,
+build documentations and so many other things. There are excellent
+[documentation and
video](https://github.com/apache/airflow/blob/master/BREEZE.rst) on Breeze
environment.
+Please check them out.
+
+### Contributing to Airflow
+Airflow is an open source project, everyone is welcome to contribute. It is
easy to get started thanks
+to the excellent [documentation on how to get
started](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst).
+
+I joined the community about 8 weeks ago through the [Outreachy
Program](https://www.outreachy.org/) and have
+completed about [30
PRs](https://github.com/apache/airflow/pulls/ephraimbuddy). How fast time
flies! You can read
+[about me](https://ephraimbuddy.wordpress.com/2020/05/24/introduction/)
+ and [how I got
into](https://ephraimbuddy.wordpress.com/2020/05/06/experiences-applying-to-outreachy/)
the Outreachy Program.
+
+It has been an amazing experience! Thanks to my mentors
[Jarek](https://github.com/potiuk) and
+[Kaxil](https://github.com/kaxil), and the community members especially
[Kamil](https://github.com/mik-laj)
+and [Tomek](https://github.com/turbaszek) for all their supports. I'm grateful!
Review comment:
nit - support, not supports :)
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
+monitor and troubleshoot DAGs and Tasks.
+
+There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
+including the duration of the task. The UI makes it possible to view the
task's dependencies in a
+tree view and graph view. You can view task logs in the UI.
+
+The web UI is started with the command `airflow webserver` in the breeze
environment.
+
+### Backend
+By default, Airflow uses the SQLite backend for storing the configuration
information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.
+
+You can use PostgreSQL or MySQL as a backend for airflow. It is easy to change
to PostgreSQL or MySQL.
+
+This command `./breeze --backend mysql` selects MySQL as the backend when
starting the breeze environment.
Review comment:
nit - "The" instead of "This", since the command is inline as a sentence
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
+monitor and troubleshoot DAGs and Tasks.
+
+There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
+including the duration of the task. The UI makes it possible to view the
task's dependencies in a
+tree view and graph view. You can view task logs in the UI.
+
+The web UI is started with the command `airflow webserver` in the breeze
environment.
+
+### Backend
+By default, Airflow uses the SQLite backend for storing the configuration
information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.
+
+You can use PostgreSQL or MySQL as a backend for airflow. It is easy to change
to PostgreSQL or MySQL.
+
+This command `./breeze --backend mysql` selects MySQL as the backend when
starting the breeze environment.
+
+### Operators
+Operators determine what gets done by a task. Airflow has a lot of builtin
Operators. Each operator
+does a specific thing. There's a BashOperator that executes a bash command,
the PythonOperator which
+calls a python function, AwsBatchOperator which executes a job on AWS Batch
and many more.
+
+### Sensors
+Sensors can be described as special operators that are used to monitor a
long-running task.
+Just like Operators, there are many predefined sensors in Airflow.
+
+### Breeze Environment
+The breeze environment is the development environment for Airflow where you
can run tests, build images,
+build documentations and so many other things. There are excellent
+[documentation and
video](https://github.com/apache/airflow/blob/master/BREEZE.rst) on Breeze
environment.
+Please check them out.
+
+### Contributing to Airflow
Review comment:
YESSS my favorite section!!!!!!!! Thank you for including!!
ALSO welcome!!!!!
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
+monitor and troubleshoot DAGs and Tasks.
+
+There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
+including the duration of the task. The UI makes it possible to view the
task's dependencies in a
+tree view and graph view. You can view task logs in the UI.
+
+The web UI is started with the command `airflow webserver` in the breeze
environment.
+
+### Backend
+By default, Airflow uses the SQLite backend for storing the configuration
information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.
+
+You can use PostgreSQL or MySQL as a backend for airflow. It is easy to change
to PostgreSQL or MySQL.
+
+This command `./breeze --backend mysql` selects MySQL as the backend when
starting the breeze environment.
+
+### Operators
+Operators determine what gets done by a task. Airflow has a lot of builtin
Operators. Each operator
+does a specific thing. There's a BashOperator that executes a bash command,
the PythonOperator which
+calls a python function, AwsBatchOperator which executes a job on AWS Batch
and many more.
+
+### Sensors
+Sensors can be described as special operators that are used to monitor a
long-running task.
+Just like Operators, there are many predefined sensors in Airflow.
+
+### Breeze Environment
Review comment:
Ah! Here's the breeze info :) Clearly I hadn't made it this far when I
had made my above comments. It might be worth moving this to the top so that
folks know what you're talking about when you provide commands later.
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
Review comment:
nit - space before the parenthesis with UI
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
+monitor and troubleshoot DAGs and Tasks.
+
+There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
+including the duration of the task. The UI makes it possible to view the
task's dependencies in a
+tree view and graph view. You can view task logs in the UI.
+
+The web UI is started with the command `airflow webserver` in the breeze
environment.
+
+### Backend
+By default, Airflow uses the SQLite backend for storing the configuration
information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.
+
+You can use PostgreSQL or MySQL as a backend for airflow. It is easy to change
to PostgreSQL or MySQL.
+
+This command `./breeze --backend mysql` selects MySQL as the backend when
starting the breeze environment.
+
+### Operators
+Operators determine what gets done by a task. Airflow has a lot of builtin
Operators. Each operator
+does a specific thing. There's a BashOperator that executes a bash command,
the PythonOperator which
+calls a python function, AwsBatchOperator which executes a job on AWS Batch
and many more.
+
+### Sensors
+Sensors can be described as special operators that are used to monitor a
long-running task.
+Just like Operators, there are many predefined sensors in Airflow.
Review comment:
consider linking to some, and providing a concrete example
##########
File path: landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md
##########
@@ -0,0 +1,101 @@
+---
+title: "Apache Airflow For Newcomers"
+linkTitle: "Apache Airflow For Newcomers"
+author: "Ephraim Anierobi"
+twitter: "ephraimbuddy"
+github: "ephraimbuddy"
+description: ""
+tags: []
+date: "2020-07-16"
+draft: false
+---
+
+Apache Airflow is a platform to programmatically author, schedule, and monitor
workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.
+
+### Authoring Workflow in Apache Airflow.
+Airflow makes it easy to author workflows using python scripts. A DAG(Directed
Acyclic Graph)
+represents a workflow in Airflow. It is a collection of tasks in a way that
shows each task's
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them in line with the task's relationships and dependencies. If task B depends
on the successful
+execution of another task A, it means airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as
+```python
+task_A >> task_B
+```
+Also equivalent to
+```python
+task_A.set_downstream(task_B)
+```
+
+That helps airflow to know that it needs to execute task A before task B.
+Let us now discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing.
+
+### Scheduler
+The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes these by reading the metadata database to check the
status of each task and
+decides what needs to be done. The metadata database is where the status of
all tasks are recorded.
+The status can be one of running, success, failed, etc.
+
+In the breeze environment, the scheduler is started by running the command
`airflow scheduler`.
+
+### Executor
+Executors are responsible for running tasks. They work with the scheduler to
get information about
+what resources are needed to run a task as the task is queued.
+
+By default, airflow uses the SequentialExecutor. However, this executor is
limited and it is the only
+executor that can be used with SQLite.
+
+There are many other executors, the difference is on the resources they have
and how they choose to
+use the resources
+
+### Webserver
+The webserver is the web interface(UI) for Airflow. The UI is feature-rich. It
makes it easy to
+monitor and troubleshoot DAGs and Tasks.
+
+There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
Review comment:
For the examples you mention, it some images of screenshots would enrich
this section
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]