potiuk commented on issue #19192:
URL: https://github.com/apache/airflow/issues/19192#issuecomment-1003760088
> I'm not sure how airflow is intended to be used, but sometimes people find
other use cases for a tool they haven't designed.
>
> We run a task that can take a few hours to collect all the historical data
and process it. And then we want the task to run once per day.
This is what airflow is designed for. I tihnk you just use it wrongly (or
misconfigured it). It is supposed to handle that case perfectly (and it works
this way for thousands of users. So it's your configuration/setup/way of using
it is wrong.
> It appears, from my side, that the airflow server UI can't contact the
scheduler while the long task is running, and other DAGs can't be run. Perhaps
the scheduler wants my code to yield control back to it frequently (once per
day of data, for example), but I prefer to let my own code manage the date
ranges, because that's where the unit tests are, and all the heavy lifting is
in rust anyway.
No. This is no the case (unless you use Sequential Executor which is only
suposed to be used for debugging). . Airflow is designed to run multiple
paralllel tass at a time:. You likely have some problem in your airlfow
installation/configuration
Questions:
1) Do you actually have scheduler running at all? Does it have coninuous
access to the DB?
2) Are you absolutely sure you are not using SequentialExecutior ? What does
your `airflow info` say - can you paste-bin output of it ? (airlfow has
built-in flag to send to pastebin). Please make sure also that you do it in
exactly the way your scheduler works. Miost likely you run your scheduler with
a different configuration than your webserver and that causes the problem.
3) Are you sure you are using Postgres and not Sqlite? What does your
`airflow info` say?
4) Where is your Python code (non-DAG)? Did you .airflowignore non-DAG
files from airflow's DAG folder?
5) can you upgrade to Airlfow 2.2.3 (latest released) - it has built-in
warnings in case you use Sequential Executor/SQLite in the UI.
6) Can you change your DAGs to:
```
default_args = {
"owner": "t4n1o",
"depends_on_past": False,
"email": ["[email protected]"],
"email_on_failure": True,
"email_on_retry": False,
"retries": 1,
"retry_delay": timedelta(minutes=2),
"max_active_runs_per_dag": 1,
}
with DAG(
"Bitmex_Archives_Mirror",
default_args=default_args,
description="Mirror the archives from public.bitmex.com ",
schedule_interval=timedelta(days=1),
start_date=days_ago(2),
tags=["raw price history"],
catchup=False,
) as dag:
t1 = BashOperator(
task_id="download_csv_gz_archives",
bash_command="sleep 1000",
)
t2 = BashOperator(
task_id="process_archives_into_daily_csv_files",
depends_on_past=False,
bash_command="sleep 1000",
retries=3,
)
t1 >> t2
```
I just run it in 2.2.3 and I was able to successuly start even 5 paralllel
runs and no problems with Scheduler

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]