Dong Lin commented on KAFKA-6188:
[~manme...@gmail.com] Thanks for your detailed information. Sorry for late
reply. I was not monitoring this Jira ticket discussion. My understanding of
your comment is that FIleSystemException is expected if Kafka tries to
delete/rename a file which is already open. I checked the current logic in the
Kafka 1.1.0 code. For log compacted topic, `replaceSegments()` will be called
which in turn calls `asyncDeleteSegment()` and `segment.changeFileSuffixes`.
Similarly for non-log-compacted topics, `deleteSegment()` calls
`asyncDeleteSegment()` which also modified file without first closing the file.
So it should affect every user constantly if this is an issue, right?
Given that this does not happen in Linux machine at LinkedIn, this issue seems
to be specific Windows or NAS which requires user to close the file before
modifying the file. Does this make sense?
> Broker fails with FATAL Shutdown - log dirs have failed
> Key: KAFKA-6188
> URL: https://issues.apache.org/jira/browse/KAFKA-6188
> Project: Kafka
> Issue Type: Bug
> Components: clients, log
> Affects Versions: 1.0.0, 1.0.1
> Environment: Windows 10
> Reporter: Valentina Baljak
> Priority: Blocker
> Labels: windows
> Attachments: Segments are opened before deletion,
> kafka_2.10-0.10.2.1.zip, output.txt
> Just started with version 1.0.0 after a 4-5 months of using 0.10.2.1. The
> test environment is very simple, with only one producer and one consumer.
> Initially, everything started fine, stand alone tests worked as expected.
> However, running my code, Kafka clients fail after approximately 10 minutes.
> Kafka won't start after that and it fails with the same error.
> Deleting logs helps to start again, and the same problem occurs.
> Here is the error traceback:
> [2017-11-08 08:21:57,532] INFO Starting log cleanup with a period of 300000
> ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,548] INFO Starting log flusher with a default period of
> 9223372036854775807 ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,798] INFO Awaiting socket connections on 0.0.0.0:9092.
> [2017-11-08 08:21:57,813] INFO [SocketServer brokerId=0] Started 1 acceptor
> threads (kafka.network.SocketServer)
> [2017-11-08 08:21:57,829] INFO [ExpirationReaper-0-Produce]: Starting
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-DeleteRecords]: Starting
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-Fetch]: Starting
> [2017-11-08 08:21:57,845] INFO [LogDirFailureHandler]: Starting
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Stopping serving
> replicas in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Partitions are
> offline due to failure on log directory C:\Kafka\kafka_2.12-1.0.0\kafka-logs
> [2017-11-08 08:21:57,860] INFO [ReplicaFetcherManager on broker 0] Removed
> fetcher for partitions (kafka.server.ReplicaFetcherManager)
> [2017-11-08 08:21:57,892] INFO [ReplicaManager broker=0] Broker 0 stopped
> fetcher for partitions because they are in the failed log dir
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,892] INFO Stopping serving logs in dir
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.log.LogManager)
> [2017-11-08 08:21:57,892] FATAL Shutdown broker because all log dirs in
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs have failed (kafka.log.LogManager)
This message was sent by Atlassian JIRA