[
https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060413#comment-14060413
]
Joe Stein commented on KAFKA-1539:
----------------------------------
Did you have the log.flush.interval.messages == 1 when doing this? If not then
you can either do something like that and sacrifice performance and futz with a
single broker flush or have (instead) replicas/brokers outside of zones that
are sharing power grids for a partition you are working with. Use replication
to achieve your durability with less sacrifice to performance using more than
one broker. If you need/want something within a single broker there are lots of
toggle to use in the broker configuration
https://kafka.apache.org/documentation.html#brokerconfigs.
> Due to OS caching Kafka might loose offset files which causes full reset of
> data
> --------------------------------------------------------------------------------
>
> Key: KAFKA-1539
> URL: https://issues.apache.org/jira/browse/KAFKA-1539
> Project: Kafka
> Issue Type: Bug
> Components: log
> Affects Versions: 0.8.1.1
> Reporter: Dmitry Bugaychenko
> Assignee: Jay Kreps
>
> Seen this while testing power failure and disk failures. Due to chaching on
> OS level (eg. XFS can cache data for 30 seconds) after failure we got offset
> files of zero length. This dramatically slows down broker startup (it have to
> re-check all segments) and if high watermark offsets lost it simply erases
> all data and start recovering from other brokers (looks funny - first
> spending 2-3 hours re-checking logs and then deleting them all due to missing
> high watermark).
> Proposal: introduce offset files rotation. Keep two version of offset file,
> write to oldest, read from the newest valid. In this case we would be able to
> configure offset checkpoint time in a way that at least one file is alway
> flushed and valid.
--
This message was sent by Atlassian JIRA
(v6.2#6252)