[ 
https://issues.apache.org/jira/browse/KAFKA-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321156#comment-15321156
 ] 

Jay Kreps commented on KAFKA-3788:
----------------------------------

You are correct--unix doesn't guarantee either the ordering or writes or even 
the contents of unflushed portions of a file (it is legal to write garbage 
bytes until you call fsync). However, this is handled in the design of the log 
I believe. The flush is asynchronous, but we recover from the last flush point 
and checksum all unflushed messages. If any messages fails to pass the checksum 
validation we truncate the log from that point on and restore off the replicas.

Synchronous flush does not work and is not an option as it would stall the roll 
of the file for the time required to flush. For a 1GB segment this could take 
quite some time.

> Potential message lost when switching to new segment
> ----------------------------------------------------
>
>                 Key: KAFKA-3788
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3788
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0
>            Reporter: Arkadiusz Firus
>            Assignee: Jay Kreps
>            Priority: Minor
>              Labels: easyfix
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> If a new segment is needed method roll() from class kafka.log.Log is invoked. 
> It prepares new segment and schedules _asynchronous_ flush of the previous 
> segment.
> Asynchronous call can lead to a problematic situation. As far as I know 
> neither Linux nor Windows guarantees that the order of files persisted to 
> disk will be the same as the order of writes to files. This means that 
> records from the new segment can be flushed before the old ones which in case 
> of power outage can lead to gaps between records.
> Changing asynchronous invocation to synchronous one will solve the problem 
> because we have guarantee that all records from the previous segment will be 
> persisted to hard drive before we write any record to the new segment.
> I am guessing that asynchronous invocation was chosen to increase performance 
> but switching between segments is not so often. So it is not a big gain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to