[ 
https://issues.apache.org/jira/browse/KAFKA-9604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058773#comment-17058773
 ] 

Maksim Larionov commented on KAFKA-9604:
----------------------------------------

Sorry to bother you . Can we expect any answer?

> Kafka cluster crash
> -------------------
>
>                 Key: KAFKA-9604
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9604
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.2.1
>            Reporter: Maksim Larionov
>            Priority: Major
>
> Good day!
> A disk space overflow occurred on one of the servers in the cluster. During 
> cleaning some partitions *.log files in the log.dirs directory were deleted 
> by mistake. When topic's retention time was reached, file 
> 00000000000007607076.log was not found. The broker stopped with error 
> message. It`s ok.
> [2020-02-06 13:32:48,965] INFO [Log partition=ocs.account-balances-12, 
> dir=/data/ocswf/kafka_broker/kafka-data] Found deletable segments with base 
> offsets [7607076] due to retention time 604800000ms breach (kafka.log.Log)
>  [2020-02-06 13:32:48,966] INFO [Log partition=ocs.account-balances-12, 
> dir=/data/ocswf/kafka_broker/kafka-data] Scheduling log segment [baseOffset 
> 7607076, size 131228281] for deletion. (kafka.log.Log)
>  [2020-02-06 13:32:48,979] ERROR Error while deleting segments for 
> ocs.account-balances-12 in dir /data/ocswf/kafka_broker/kafka-data 
> (kafka.server.LogDirFailureChannel)
>  java.nio.file.NoSuchFileException: 
> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log
>  Suppressed: java.nio.file.NoSuchFileException: 
> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log
>  -> 
> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log.deleted
>  [2020-02-06 13:32:48,982] INFO [ReplicaManager broker=3] Stopping serving 
> replicas in dir /data/ocswf/kafka_broker/kafka-data 
> (kafka.server.ReplicaManager)
>  [2020-02-06 13:32:48,983] ERROR Uncaught exception in scheduled task 
> 'kafka-log-retention' (kafka.utils.KafkaScheduler)
>  org.apache.kafka.common.errors.KafkaStorageException: Error while deleting 
> segments for ocs.account-balances-12 in dir 
> /data/ocswf/kafka_broker/kafka-data
>  Caused by: java.nio.file.NoSuchFileException: 
> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log
>  Suppressed: java.nio.file.NoSuchFileException: 
> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log
>  -> 
> /data/ocswf/kafka_broker/kafka-data/ocs.account-balances-12/00000000000007607076.log.deleted
>  ...
>  [2020-02-06 13:32:49,058] INFO Stopping serving logs in dir 
> /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager)
>  [2020-02-06 13:32:49,078] ERROR Shutdown broker because all log dirs in 
> /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager)
>  
> Then all the other nodes of the cluster stopped abruptly in the election of 
> partition leaders:
> [2020-02-06 13:32:53,620] ERROR [ReplicaManager broker=1] Error while making 
> broker the leader for partition Topic: ocs.counter-balances; Partition: 40; 
> Leader: Some(3); AllReplicas: 1,2,3,4; InSyncReplicas: 1,2,4 in dir 
> Some(/data/ocswf/kafka_broker/kafka-data) (kafka.server.ReplicaManager)
>  org.apache.kafka.common.errors.KafkaStorageException: Error while writing to 
> checkpoint file 
> /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint
>  Caused by: java.io.FileNotFoundException: 
> /data/ocswf/kafka_broker/kafka-data/ocs.counter-balances-40/leader-epoch-checkpoint.tmp
>  (No such file or directory)
>  [2020-02-06 13:32:53,687] INFO Stopping serving logs in dir 
> /data/ocswf/kafka_broker/kafka-data (kafka.log.LogManager)
>  [2020-02-06 13:32:53,698] ERROR Shutdown broker because all log dirs in 
> /data/ocswf/kafka_broker/kafka-data have failed (kafka.log.LogManager)
> Is it normal?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to