[ https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064922#comment-15064922 ]
ASF GitHub Bot commented on KAFKA-1860: --------------------------------------- GitHub user MayureshGharat opened a pull request: https://github.com/apache/kafka/pull/697 KAFKA-1860 The JVM should stop if the underlying file system goes in to Read only mode You can merge this pull request into a Git repository by running: $ git pull https://github.com/MayureshGharat/kafka kafka-1860 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/697.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #697 ---- commit 5c7f2e749fd8674bae66b6698319181a0f3e9251 Author: Mayuresh Gharat <mgha...@mgharat-ld1.linkedin.biz> Date: 2015-12-18T18:28:32Z Added topic-partition information to the exception message on batch expiry in RecordAccumulator commit 140d89f33171d665ec27839e8589f2055dc2a34b Author: Mayuresh Gharat <mgha...@mgharat-ld1.linkedin.biz> Date: 2015-12-18T19:02:49Z Made the exception message more clear explaining why the batches expired ---- > File system errors are not detected unless Kafka tries to write > --------------------------------------------------------------- > > Key: KAFKA-1860 > URL: https://issues.apache.org/jira/browse/KAFKA-1860 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Assignee: Mayuresh Gharat > Fix For: 0.10.0.0 > > Attachments: KAFKA-1860.patch > > > When the disk (raid with caches dir) dies on a Kafka broker, typically the > filesystem gets mounted into read-only mode, and hence when Kafka tries to > read the disk, they'll get a FileNotFoundException with the read-only errno > set (EROFS). > However, as long as there is no produce request received, hence no writes > attempted on the disks, Kafka will not exit on such FATAL error (when the > disk starts working again, Kafka might think some files are gone while they > will reappear later as raid comes back online). Instead it keeps spilling > exceptions like: > {code} > 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] > [kafka-server] [] Uncaught exception in scheduled task > 'kafka-recovery-point-checkpoint' > java.io.FileNotFoundException: > /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp > (Read-only file system) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:206) > at java.io.FileOutputStream.<init>(FileOutputStream.java:156) > at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)