[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359210#comment-15359210
 ] 

Chris Lambertus edited comment on APEXMALHAR-2086 at 10/8/16 4:33 AM:
----------------------------------------------------------------------

[~thw]

Sorry Thomas, one thing is not correct. There is a existing 0.8 output operator 
that does exactly-once with a lot more assumption which I think is not very 
useful

The what that operator works is it only load the last messages from all 
partitions and then it compares the replay messages with that one.
This strongly requires the order the messages the type of message should be 
comparable which neither of them are usual cases.

That's why we decide to refine the logic in the new 0.9 exactly-once output 
operator.


was (Author: hsy541):
[[email protected]]
Sorry Thomas, one thing is not correct. There is a existing 0.8 output operator 
that does exactly-once with a lot more assumption which I think is not very 
useful

The what that operator works is it only load the last messages from all 
partitions and then it compares the replay messages with that one.
This strongly requires the order the messages the type of message should be 
comparable which neither of them are usual cases.

That's why we decide to refine the logic in the new 0.9 exactly-once output 
operator.

> Kafka Output Operator with Kafka 0.9 API
> ----------------------------------------
>
>                 Key: APEXMALHAR-2086
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2086
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Sandesh
>            Assignee: Sandesh
>             Fix For: 3.5.0
>
>
> Goal : 2 Operartors for Kafka Output
>       1. Simple Kafka Output Operator 
>             - Supports Atleast Once 
>             - Expose most used producer properties as class properties
>       2. Exactly Once Kafka Output ( Not possible in all the cases, will be 
> documented later )
>             
> Design for Exactly Once
> Window Data Manager - Stores the Kafka partitions offsets.
> Kafka Key - Used by the operator = AppID#OperatorId
> During recovery. Partially written window is re-created using the following  
> approach:
> Tuples between the largest recovery offsets and the current offset are 
> checked. Based on the key, tuples written by the other entities are 
> discarded. 
> Only tuples which are not in the recovered set are emitted.
> Tuples needs to be unique within the window.
>       



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to