Hello Kafka Developers, I'm running a 16 nodes kafka cluster for my company. Today I tried to expand one topic from 3 partitions to 5 partitions. It works well before, but this morning, it turned down the new brokers. It seems the "Append" happens before "Index Ready".
Following is the detail logs. "vacation.schedule.priceInfo.created,3" is the new assigned partition, and 1088251 is the new broker for this topic. Is there a potential bug here? It not happens every time. [2016-02-01 09:32:44,558] INFO [ReplicaFetcherManager on broker 1088251] Removed fetcher for partitions [vacation.schedule.priceInfo.created,3] (kafka .server.ReplicaFetcherManager) [2016-02-01 09:32:45,063] INFO Partition [vacation.schedule.priceInfo.created,3] on broker 1088251: No checkpointed highwatermark is found for partiti on [vacation.schedule.priceInfo.created,3] (kafka.cluster.Partition) [2016-02-01 09:33:00,965] FATAL [Replica Manager on Broker 1088251]: Halting due to unrecoverable I/O error while handling produce request: (kafka.se rver.ReplicaManager) kafka.common.KafkaStorageException: I/O exception in append to log 'vacation.schedule.priceInfo.created-3' at kafka.log.Log.append(Log.scala:318) at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442) at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386) at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322) at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:366) at kafka.server.KafkaApis.handle(KafkaApis.scala:68) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.FileNotFoundException: /data01/kafka/vacation.schedule.priceInfo.created-3/00000000000000000000.index (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264) at kafka.log.Log.roll(Log.scala:627) at kafka.log.Log.maybeRoll(Log.scala:602) at kafka.log.Log.append(Log.scala:357) ... 22 more