ASF GitHub Bot commented on FLINK-6988:

Github user pnowojski commented on the issue:

    Writing records in state would be very costly. It is only a "last resort" 
    > That would imply exactly-once consumers can not read past that 
transaction as long as it is open
    Hmmm, are you sure about this thing? That would mean that Kafka doesn't 
support transactional parallel writes from two different process, which would 
be very strange. Could you point to a source of this information? 
    Resuming transactions is not a part of `KafkaProducer`'s API, however 
Kafka's REST API allows to do that. However I'm aware that it wasn't an 
intention of the authors to do so. Kafka Streams do not need to do that, 
because they achieve exactly-once semantic by using persistent communication 
channels (Kafka topics), so they can easily restart each operator on it's own 
by replay/rewinding every input channel (Kafka topic). This comes with a cost, 
because it makes communication between operators extremely, since every message 
must goes to HDDs at some point. 

> Add Apache Kafka 0.11 connector
> -------------------------------
>                 Key: FLINK-6988
>                 URL: https://issues.apache.org/jira/browse/FLINK-6988
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kafka Connector
>    Affects Versions: 1.3.1
>            Reporter: Piotr Nowojski
>            Assignee: Piotr Nowojski
> Kafka 0.11 (it will be released very soon) add supports for transactions. 
> Thanks to that, Flink might be able to implement Kafka sink supporting 
> "exactly-once" semantic. API changes and whole transactions support is 
> described in 
> [KIP-98|https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging].
> The goal is to mimic implementation of existing BucketingSink. New 
> FlinkKafkaProducer011 would 
> * upon creation begin transaction, store transaction identifiers into the 
> state and would write all incoming data to an output Kafka topic using that 
> transaction
> * on `snapshotState` call, it would flush the data and write in state 
> information that current transaction is pending to be committed
> * on `notifyCheckpointComplete` we would commit this pending transaction
> * in case of crash between `snapshotState` and `notifyCheckpointComplete` we 
> either abort this pending transaction (if not every participant successfully 
> saved the snapshot) or restore and commit it. 

This message was sent by Atlassian JIRA

Reply via email to