Jarek Potiuk created CASSANDRA-17612:
----------------------------------------

             Summary: Cassandra latest (3.0.26) image fails to start with 
health check
                 Key: CASSANDRA-17612
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17612
             Project: Cassandra
          Issue Type: Bug
          Components: CI, Packaging
            Reporter: Jarek Potiuk


Today our CI images at Apache Airflow started to fail, and when we 
investigated, the root cause seems to be that Cassandra 3.0 image in our CI 
jobs failed to start (and pass health checks). Usually we have one of our tests 
bring up a number of images via docker compose and we used "cassandra:3.0" 
image for that.

The whole tests fails because cassandra container is unhealthy:

[https://github.com/apache/airflow/runs/6320170343?check_suite_focus=true#step:10:6651]
[https://github.com/apache/airflow/runs/6319805534?check_suite_focus=true#step:10:12629]
[https://github.com/apache/airflow/runs/6319710486?check_suite_focus=true#step:10:6759]
ERROR: for airflow  Container "3bd115315ba7" is unhealthy.
  Encountered errors while bringing up the project.


 3bd115315ba7   cassandra:3.0                                               
"docker-entrypoint.s…"   5 minutes ago   Up 5 minutes (unhealthy)   
7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp                                     
                                                    
airflow-integration-postgres_cassandra_1
Our docker-compose entry is here:

[https://github.com/apache/airflow/blob/main/scripts/ci/docker-compose/integration-cassandra.yml]

Basically - we run healthcheck that checks if cassandra is up and this health 
check worked fine before, but seems to fail now. It's either we are using wrong 
healthcheck or there is some bug in the command ?:

{{    healthcheck:}}
{{      test: "[ $$(nodetool statusgossip) = running ]"}}
{{      interval: 5s}}
{{      timeout: 30s}}
{{      retries: 50}}
{{    restart: always}}

We mitigated it by switching to 3.0.25 temporarily 
[https://github.com/apache/airflow/pull/23522]

Is this an error in cassandra? Or should we maybe change our health-check 
command?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to