[GitHub] [airflow] potiuk commented on issue #19192: The scheduler does not appear to be running. Last heartbeat was received X minutes ago.

GitBox Sun, 02 Jan 2022 11:01:10 -0800


potiuk commented on issue #19192:
URL: https://github.com/apache/airflow/issues/19192#issuecomment-1003760088



   > I'm not sure how airflow is intended to be used, but sometimes people find 
other use cases for a tool they haven't designed.
   > 
   > We run a task that can take a few hours to collect all the historical data 
and process it. And then we want the task to run once per day.
   
   This is what airflow is designed for. I tihnk you just use it wrongly (or 
misconfigured it). It is supposed to handle that case perfectly (and it works 
this way for thousands of users. So it's your configuration/setup/way of using 
it is wrong.
   
   > It appears, from my side, that the airflow server UI can't contact the 
scheduler while the long task is running, and other DAGs can't be run. Perhaps 
the scheduler wants my code to yield control back to it frequently (once per 
day of data, for example), but I prefer to let my own code manage the date 
ranges, because that's where the unit tests are, and all the heavy lifting is 
in rust anyway.
   
   No. This is no the case (unless you use Sequential Executor which is only 
suposed to be used for debugging). . Airflow is designed to run multiple 
paralllel tass at a time:. You likely have some problem in your airlfow 
installation/configuration
   
   Questions:
   
   1) Do you actually have scheduler running at all?  Does it have coninuous 
access to the DB?
   
   2) Are you absolutely sure you are not using SequentialExecutior ? What does 
your `airflow info` say  - can you paste-bin output of it ? (airlfow has 
built-in flag to send to pastebin). Please make sure also that you do it in 
exactly the way your scheduler works. Miost likely you run your scheduler with 
a different configuration than your webserver and that causes the problem.
   
   3) Are you sure you are using Postgres and not Sqlite? What does your 
`airflow info` say?
   
   4) Where is your Python code (non-DAG)?  Did you .airflowignore non-DAG 
files from airflow's DAG folder?
   
   5) can you upgrade to Airlfow 2.2.3 (latest released) - it has built-in 
warnings in case you use Sequential Executor/SQLite in the  UI.
   
   6) Can you change your DAGs to:
   
   ```
   default_args = {
       "owner": "t4n1o",
       "depends_on_past": False,
       "email": ["[email protected]"],
       "email_on_failure": True,
       "email_on_retry": False,
       "retries": 1,
       "retry_delay": timedelta(minutes=2),
       "max_active_runs_per_dag": 1,
   
   }
   with DAG(
       "Bitmex_Archives_Mirror",
       default_args=default_args,
       description="Mirror the archives from public.bitmex.com ",
       schedule_interval=timedelta(days=1),
       start_date=days_ago(2),
       tags=["raw price history"],
       catchup=False,
   ) as dag:
   
       t1 = BashOperator(
           task_id="download_csv_gz_archives",
           bash_command="sleep 1000",
       )
   
       t2 = BashOperator(
           task_id="process_archives_into_daily_csv_files",
           depends_on_past=False,
           bash_command="sleep 1000",
           retries=3,
       )
   
       t1 >> t2
   ```
   
   I just run it in 2.2.3 and I was able to successuly start even 5 paralllel 
runs and no problems with Scheduler 
   
   ![Screenshot from 2022-01-02 
19-57-18](https://user-images.githubusercontent.com/595491/147886449-06a9acae-d6ec-4e61-8ea4-14117e9d97f4.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on issue #19192: The scheduler does not appear to be running. Last heartbeat was received X minutes ago.

Reply via email to