[ 
https://issues.apache.org/jira/browse/KAFKA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527961#comment-17527961
 ] 

Haruki Okada commented on KAFKA-13855:
--------------------------------------

H-mm sorry, sounds like I just overstepped.

Yeah, seems we need to dig into this further. Please nevermind for now.

> FileNotFoundException: Error while rolling log segment for topic partition in 
> dir
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-13855
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13855
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 2.6.1
>            Reporter: Sergey Ivanov
>            Priority: Major
>
> Hello,
> We faced an issue when one of Kafka broker in cluster has failed with an 
> exception and restarted:
>  
> {code:java}
> [2022-04-13T09:51:44,563][ERROR][category=kafka.server.LogDirFailureChannel] 
> Error while rolling log segment for prod_data_topic-7 in dir 
> /var/opt/kafka/data/1
> java.io.FileNotFoundException: 
> /var/opt/kafka/data/1/prod_data_topic-7/00000000000026872377.index (No such 
> file or directory)
>       at java.base/java.io.RandomAccessFile.open0(Native Method)
>       at java.base/java.io.RandomAccessFile.open(Unknown Source)
>       at java.base/java.io.RandomAccessFile.<init>(Unknown Source)
>       at java.base/java.io.RandomAccessFile.<init>(Unknown Source)
>       at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:183)
>       at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176)
>       at 
> kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242)
>       at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242)
>       at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:508)
>       at kafka.log.Log.$anonfun$roll$8(Log.scala:1916)
>       at kafka.log.Log.$anonfun$roll$2(Log.scala:1916)
>       at kafka.log.Log.roll(Log.scala:2349)
>       at kafka.log.Log.maybeRoll(Log.scala:1865)
>       at kafka.log.Log.$anonfun$append$2(Log.scala:1169)
>       at kafka.log.Log.append(Log.scala:2349)
>       at kafka.log.Log.appendAsLeader(Log.scala:1019)
>       at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:984)
>       at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:972)
>       at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$4(ReplicaManager.scala:883)
>       at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273)
>       at 
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>       at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>       at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>       at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>       at scala.collection.TraversableLike.map(TraversableLike.scala:273)
>       at scala.collection.TraversableLike.map$(TraversableLike.scala:266)
>       at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>       at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:871)
>       at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:571)
>       at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:605)
>       at kafka.server.KafkaApis.handle(KafkaApis.scala:132)
>       at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:70)
>       at java.base/java.lang.Thread.run(Unknown Source)
> [2022-04-13T09:51:44,812][ERROR][category=kafka.log.LogManager] Shutdown 
> broker because all log dirs in /var/opt/kafka/data/1 have failed {code}
> There are no any additional useful information in logs, just one warn before 
> this error:
> {code:java}
> [2022-04-13T09:51:44,720][WARN][category=kafka.server.ReplicaManager] 
> [ReplicaManager broker=1] Broker 1 stopped fetcher for partitions 
> __consumer_offsets-22,prod_data_topic-5,__consumer_offsets-30,
> ....
> prod_data_topic-0 and stopped moving logs for partitions  because they are in 
> the failed log directory /var/opt/kafka/data/1.
> [2022-04-13T09:51:44,720][WARN][category=kafka.log.LogManager] Stopping 
> serving logs in dir /var/opt/kafka/data/1{code}
> The topic configuration is:
> {code:java}
> /opt/kafka $ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 
> --describe --topic prod_data_topic
> Topic: prod_data_topic        PartitionCount: 12      ReplicationFactor: 3    
> Configs: 
> min.insync.replicas=2,segment.bytes=1073741824,max.message.bytes=15728640,retention.bytes=4294967296
>         Topic: prod_data_topic        Partition: 0    Leader: 3       
> Replicas: 3,1,2 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 1    Leader: 1       
> Replicas: 1,2,3 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 2    Leader: 2       
> Replicas: 2,3,1 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 3    Leader: 3       
> Replicas: 3,2,1 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 4    Leader: 1       
> Replicas: 1,3,2 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 5    Leader: 2       
> Replicas: 2,1,3 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 6    Leader: 3       
> Replicas: 3,2,1 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 7    Leader: 1       
> Replicas: 1,3,2 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 8    Leader: 2       
> Replicas: 2,1,3 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 9    Leader: 3       
> Replicas: 3,1,2 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 10   Leader: 1       
> Replicas: 1,2,3 Isr: 3,2,1
>         Topic: prod_data_topic        Partition: 11   Leader: 2       
> Replicas: 2,3,1 Isr: 3,2,1 {code}
> Previously (a day before it happened) we have set "rettention.bytes" broker 
> config to: 5368709120 (previously the values was 6442450944). But not sure it 
> affected. Current custom broker config:
>  
> {code:java}
> log.retention.check.interval.ms=300000
> log.segment.bytes=1073741824
> log.retention.bytes=4294967296
> log.retention.hours=40
> message.max.bytes=15728640
> replica.lag.time.max.ms=30000
> min.insync.replicas=2
> delete.topic.enable=true
> replica.fetch.max.bytes=15728640
> default.replication.factor=3
> num.replica.fetchers=2 
> {code}
>  
> Could you please help to investigate what could be a reason of this fail? 
> Because we don't have any ideas (there were no cleaning topics, files or 
> other maintenance procedure with disk). 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to