denismatveev opened a new issue #15471:
URL: https://github.com/apache/airflow/issues/15471


   I am trying to install three node airflow cluster. Each node has airflow 
scheduler, airflow worker, airflow webserver, also it has celery, RabbitMQ 
cluster and Postgres multi master cluster(implemented with Bucardo). Versions 
of software:
   
    - Airflow 2.0.1 
    - Postregsql 13.2
    - Ubuntu 20.04
    - pyhton 3.8.5
    - celery 4.4.7   
    - bucardo 5.6.0
    - RabbitMQ 3.8.2
   
   And I occur the problem starting airflow scheduler.
   
   When I launch the first one(database is empty), it successfully starts. When 
scheduler is running on the machine, then when I launch another scheduler on 
another machine(I tried to launch on the same machine too), it fails with the 
following:
   
   ```
   sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate 
key value violates unique constraint "job_pkey"
   DETAIL:  Key (id)=(25) already exists.
   
   [SQL: INSERT INTO job (dag_id, state, job_type, start_date, end_date, 
latest_heartbeat, executor_class, hostname, unixname) VALUES (%(dag_id)s, 
%(state)s, %(job_type)s, %(start_date)s, %(end_date)s, %(latest_heartbeat)s, 
%(executor_class)s, %(hostname)s, %(unixname)s) RETURNING job.id]
   [parameters: {'dag_id': None, 'state': 'running', 'job_type': 
'SchedulerJob', 'start_date': datetime.datetime(2021, 4, 21, 7, 39, 20, 429478, 
tzinfo=Timezone('UTC')), 'end_date': None, 'latest_heartbeat': 
datetime.datetime(2021, 4, 21, 7, 39, 20, 429504, tzinfo=Timezone('UTC')), 
'executor_class': 'CeleryExecutor', 'hostname': 'hostname', 'unixname': 'root'}]
   (Background on this error at: http://sqlalche.me/e/13/gkpj)
   
   ```
   After trying to launch a few times eventually scheduler is working. I am 
assuming id is incremented and then data is successfully added into database:
   
   ```
   airflow=> select * from job order by state;
    id | dag_id |  state  |   job_type   |          start_date           |      
     end_date            |       latest_heartbeat        | executor_class |     
      hostname           | unixname 
   
----+--------+---------+--------------+-------------------------------+-------------------------------+-------------------------------+----------------+------------------------------+----------
    26 |        | running | SchedulerJob | 2021-04-21 07:39:22.243721+00 |      
                         | 2021-04-21 07:39:22.243734+00 | CeleryExecutor |     
           machine name  | root
    25 |        | running | SchedulerJob | 2021-04-21 07:39:14.515009+00 |      
                         | 2021-04-21 07:39:19.632811+00 | CeleryExecutor |     
           machine name  | root 
   
   ```
   
   There is a warning with log tables as well(If the second and subsequent 
schedulers successfully started):
   ```
   WARNING - Failed to log action with (psycopg2.errors.UniqueViolation) 
duplicate key value violates unique constraint "log_pkey"
   DETAIL:  Key (id)=(40) already exists.
   ```
   If I succeeded in launching scheduler on another node(i.e. schedulers are 
working on two mahines), attempt to launch another instance on one of those 
machine will fail with the same error.
   
   I understand why scheduler cannot insert data into table, but I assume 
something's wrong with architecture or table structure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to