[
https://issues.apache.org/jira/browse/CASSANDRA-17612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538181#comment-17538181
]
Klaus Kierer commented on CASSANDRA-17612:
------------------------------------------
{quote}Docker is just lagging behind then, though 3+ months seems a bit
excessive.
{quote}
No it´s not that docker is lagging behind, but the Cassandra images are rebuilt
as soon as the base image is recreated because of Java updates.
Java 11.0.15 / 8u332 was released April 19th and it took some time until
Adoptium had all builds ready, As soon as this was the case Eclipse Temurin
Docker images had been recreated and as a result the Cassandra images have been
rebuilt as well.
For the details please have a look at
[https://github.com/docker-library/faq#an-images-source-changed-in-git-now-what],
but the interesting part reads as follows:
{quote}...These refreshed base images also means that any other image in the
Official Images program that is FROM them will also be rebuilt...
{quote}
This means that the Cassandra Docker image changes over time even when there is
no Cassandra release which has it´s own risks as we can see with the latest
Java update.
> Cassandra latest (3.0.26) image fails to start with health check
> ----------------------------------------------------------------
>
> Key: CASSANDRA-17612
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17612
> Project: Cassandra
> Issue Type: Bug
> Components: CI, Packaging
> Reporter: Jarek Potiuk
> Priority: Normal
> Attachments: Screenshot 2022-05-06 at 12.19.05.png
>
>
> Today our CI images at Apache Airflow started to fail , and when we
> investigated, the root cause seems to be that Cassandra 3.0 image in our CI
> jobs failed to start (and pass health checks). Usually we have one of our
> tests bring up a number of images via docker compose and we used
> "cassandra:3.0" image for that. We noticed 3.0.26 was released 15 hours ago
> so this is almost for sure some 3.0.25 -> 3.0.26 difference.
> The whole tests fails because cassandra container is unhealthy:
> [https://github.com/apache/airflow/runs/6320170343?check_suite_focus=true#step:10:6651]
> [https://github.com/apache/airflow/runs/6319805534?check_suite_focus=true#step:10:12629]
> [https://github.com/apache/airflow/runs/6319710486?check_suite_focus=true#step:10:6759]
> {{ERROR: for airflow Container "3bd115315ba7" is unhealthy.}}
> {{Encountered errors while bringing up the project.}}
> {{3bd115315ba7 cassandra:3.0 "docker-entrypoint.s…" 5 minutes ago Up 5
> minutes (unhealthy) 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp
> airflow-integration-postgres_cassandra_1}}
>
> The errors from the cassandra container do not show anything suspicious:
> {{INFO 08:45:22 Using Netty Version:
> [netty-buffer=netty-buffer-4.0.44.Final.452812a,
> netty-codec=netty-codec-4.0.44.Final.452812a,
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a,
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a,
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a,
> netty-common=netty-common-4.0.44.Final.452812a,
> netty-handler=netty-handler-4.0.44.Final.452812a,
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb,
> netty-transport=netty-transport-4.0.44.Final.452812a,
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
> netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a,
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a,
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]}}
> {{INFO 08:45:22 Starting listening for CQL clients on /0.0.0.0:9042
> (unencrypted)...}}
> {{INFO 08:45:23 Not starting RPC server as requested. Use JMX
> (StorageService->startRPCServer()) or nodetool (enablethrift) to start it}}
> {{INFO 08:45:23 Startup complete}}
> {{INFO 08:45:24 Created default superuser role ‘cassandra’}}
>
> Our docker-compose entry is here:
> [https://github.com/apache/airflow/blob/main/scripts/ci/docker-compose/integration-cassandra.yml]
> Basically - we run healthcheck that checks if cassandra is up and this health
> check worked fine before, but seems to fail now. It's either we are using
> wrong healthcheck or there is some bug in the command ?:
> {{ healthcheck:}}
> {{ test: "[ $$(nodetool statusgossip) = running ]"}}
> {{ interval: 5s}}
> {{ timeout: 30s}}
> {{ retries: 50}}
> {{ restart: always}}
> We mitigated it by switching to 3.0.25 temporarily
> [https://github.com/apache/airflow/pull/23522]
> Is this an error in cassandra? Or should we maybe change our health-check
> command?
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]