Maksim Larionov created KAFKA-9604:
--------------------------------------
Summary: Падение кластера
Key: KAFKA-9604
URL: https://issues.apache.org/jira/browse/KAFKA-9604
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 2.2.1
Reporter: Maksim Larionov
Добрый день!
На одном из серверов в кластере произошло переполнение дискового пространства.
При очистке по ошибке были удалены некоторые файлы *.log некоторых реплик в
log.dirs. При достижении retention time сработала очистка и физический файл
00000000000007607076.log не был найден. Брокер аварийно остановился.
[2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12,
dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base
offsets [7607076] due to retention time 604800000ms breach (kafka.log.Log)
[2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12,
dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset
7607076, size 131228281] for deletion. (kafka.log.Log)
[2020-02-06 13:32:48,979] ERROR Error while deleting segments for
ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data
(kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException:
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log
Suppressed: java.nio.file.NoSuchFileException:
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log
->
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log.deleted
[2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving
replicas in dir /data/ocswf/kafka_broker/kafka-data
(kafka.server.ReplicaManager)
[2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task
'kafka-log-retention' (kafka.utils.KafkaScheduler)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting
segments for ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data
Caused by: java.nio.file.NoSuchFileException:
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log
Suppressed: java.nio.file.NoSuchFileException:
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log
->
/data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log.deleted
...
[2020-02-06 13:32:49,058] INFO Stopping serving logs in dir
/data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager)
[2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in
/data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager)
Затем аварийно остановились все остальные ноды кластера на выборах лидеров
партиций:
[2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making
broker the leader for partition Topic: ocs.counter-balances; Partition: 40;
Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir
Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.KafkaStorageException: Error while writing to
checkpoint file
/data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint
Caused by: java.io.FileNotFoundException:
/data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp
(No such file or directory)
не смогли переписать leader-epoch-checkpoint и остановились по этой причине
[2020-02-06 13:32:53,687] INFO Stopping serving logs in dir
/data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager)
[2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in
/data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager)
Является ли эта ситуация нормой?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)