[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

ASF GitHub Bot (JIRA) Fri, 18 Dec 2015 15:05:54 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064922#comment-15064922
 ]


ASF GitHub Bot commented on KAFKA-1860:
---------------------------------------

GitHub user MayureshGharat opened a pull request:

    https://github.com/apache/kafka/pull/697

    KAFKA-1860

    The JVM should stop if the underlying file system goes in to Read only mode

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MayureshGharat/kafka kafka-1860

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/697.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #697
    
----
commit 5c7f2e749fd8674bae66b6698319181a0f3e9251
Author: Mayuresh Gharat <[email protected]>
Date:   2015-12-18T18:28:32Z

    Added topic-partition information to the exception message on batch expiry 
in RecordAccumulator

commit 140d89f33171d665ec27839e8589f2055dc2a34b
Author: Mayuresh Gharat <[email protected]>
Date:   2015-12-18T19:02:49Z

    Made the exception message more clear explaining why the batches expired

----


> File system errors are not detected unless Kafka tries to write
> ---------------------------------------------------------------
>
>                 Key: KAFKA-1860
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1860
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Assignee: Mayuresh Gharat
>             Fix For: 0.10.0.0
>
>         Attachments: KAFKA-1860.patch
>
>
> When the disk (raid with caches dir) dies on a Kafka broker, typically the 
> filesystem gets mounted into read-only mode, and hence when Kafka tries to 
> read the disk, they'll get a FileNotFoundException with the read-only errno 
> set (EROFS).
> However, as long as there is no produce request received, hence no writes 
> attempted on the disks, Kafka will not exit on such FATAL error (when the 
> disk starts working again, Kafka might think some files are gone while they 
> will reappear later as raid comes back online). Instead it keeps spilling 
> exceptions like:
> {code}
> 2015/01/07 09:47:41.543 ERROR [KafkaScheduler] [kafka-scheduler-1] 
> [kafka-server] [] Uncaught exception in scheduled task 
> 'kafka-recovery-point-checkpoint'
> java.io.FileNotFoundException: 
> /export/content/kafka/i001_caches/recovery-point-offset-checkpoint.tmp 
> (Read-only file system)
>       at java.io.FileOutputStream.open(Native Method)
>       at java.io.FileOutputStream.<init>(FileOutputStream.java:206)
>       at java.io.FileOutputStream.<init>(FileOutputStream.java:156)
>       at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1860) File system errors are not detected unless Kafka tries to write

Reply via email to