Hi All,
In my current division already Temporal is being used for some onboarding
workflow management, now there are a few data pipelines that will have few
ETL jobs hence I am thinking of using Airflow to schedule ETL jobs and
monitor. But to use airflow, it is required to provide a clear advantage
over the Temporal. I never used Temporal so was browsing to get
the analysis. Below are the details I could get it. However, these points
are not helping to make a decision. Kindly, help us If any of the user
knows it well.
*My Requirments: *
1. *ETL* *nature* :
- Lot of ETL jobs to move data from AWS s3 to the warehouse (redshift,
snowflake) with some transformation
- ETL to move stage table to conformed schemes (schema keeps more
structured) tables
- ETL to perform a lot of aggregation over tables to create Data Mart
2. *Scaling *
- Will have a lot of jobs
- Terabytes of data
- millions of rows
* 3. HA: *Should be highly available
* 4. Latency: *Low latency scheduling and execution
*5. Fault tolerant:*
- Retry on failures
- Auto recovery
* 6. Depedncy management between modules, DAGs etc *
*Comparison: *
*Airflow:*
- Strong integration with the Python ecosystem.
- Rich UI for monitoring and managing workflows.
- Particularly strong when you need a scheduler-driven approach with a
visual DAG representation
*Temporal*:
- Well-suited for scenarios where we need to manage long-running,
stateful workflows with a programming model that allows for flexibility in
defining complex logic.
- Provides durable and reliable execution by default.
- Allows for complex workflows with sophisticated coordination and state
management.
Thanks,
Coder