[ 
https://issues.apache.org/jira/browse/GEARPUMP-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang updated GEARPUMP-33:
-------------------------------
    Description: 
The original discussions are at 
[https://github.com/gearpump/gearpump/issues/1528] and 
[https://github.com/gearpump/gearpump/issues/354].

When a message flows through a stream processing system, the system will try to 
provide some guarantee on message delivery From the weakest to strongest, there 
are.

# At most once delivery
  a message is processed zero or one times. Messages can be lost. 

# At least once delivery
   a message is processed one or more times such that at least one of them 
succeeds. Messages can not be lost but can be duplicated.

# Exactly once delivery
  a message is processed exactly once. Messages can neither be lost nor 
duplicated.

Gearpump tracks message loss between a sender Task and a receiver Task and 
replays the application on message loss. If the source is TimeReplayable, then 
at-least-once delivery can be guaranteed. In addition, if user state is stored 
through PersistentState API, then exactly-once delivery is guaranteed. 
Otherwise, at-most-once delivery is guaranteed. 

There are several limitations with the current implementation. 

1. If users only require at-most-once delivery, message loss track is not 
necessary and we may get better performance without it. 
2. We require user's data source to be TimeReplayable for 
at-least-once/exactly-once delivery. It would be better if we provide a 
TimeReplayable wrapper when user source is not replayable (e.g. Twitter)
3.  Further, it would be nice if we allow users to switch between the different 
guarantees through APIs or dashboard.

This jira is to gather requirements and ideas from the community and users. The 
real work will be divided into subtasks and committed step by step. 
  

  was:
The original discussions are at 
[https://github.com/gearpump/gearpump/issues/1528] and 
[https://github.com/gearpump/gearpump/issues/354].

When a message flows through a stream processing system, the system will try to 
provide some guarantee on message delivery From the weakest to strongest, there 
are.

# At most once delivery
  a message is processed zero or one times. Messages can be lost. 

# At least once delivery
   a message is processed one or more times such that at least one of them 
succeeds. Messages can not be lost but can be duplicated.

# Exactly once delivery
  a message is processed exactly once. Messages can neither be lost nor 
duplicated.

Gearpump tracks message loss between a sender Task and a receiver Task and 
replays the application on message loss. If the source is TimeReplayable, then 
at-least-once delivery can be guaranteed. In addition, if user state is stored 
through PersistentState API, then exactly-once delivery is guaranteed. 
Otherwise, at-most-once delivery is guaranteed. 

There are several limitations with the current implementation. 

1. If users only require at-most-once delivery, message loss track is not 
necessary and we may get better performance without it. 
2. We require user's data source to be TimeReplayable for 
at-least-once/exactly-once delivery. It would be better if we provide a 
TimeReplayable wrapper when user source is not replayable (e.g. Twitter)
3.  Further, it will be nice if we allow users to switch between the different 
guarantees through APIs or dashboard.

This jira is to gather requirements and ideas from the community and users. The 
real work will be divided into subtasks and committed step by step. 
  


> Message Delivery Guarantee
> --------------------------
>
>                 Key: GEARPUMP-33
>                 URL: https://issues.apache.org/jira/browse/GEARPUMP-33
>             Project: Apache Gearpump
>          Issue Type: Improvement
>          Components: streaming
>            Reporter: Manu Zhang
>
> The original discussions are at 
> [https://github.com/gearpump/gearpump/issues/1528] and 
> [https://github.com/gearpump/gearpump/issues/354].
> When a message flows through a stream processing system, the system will try 
> to provide some guarantee on message delivery From the weakest to strongest, 
> there are.
> # At most once delivery
>   a message is processed zero or one times. Messages can be lost. 
> # At least once delivery
>    a message is processed one or more times such that at least one of them 
> succeeds. Messages can not be lost but can be duplicated.
> # Exactly once delivery
>   a message is processed exactly once. Messages can neither be lost nor 
> duplicated.
> Gearpump tracks message loss between a sender Task and a receiver Task and 
> replays the application on message loss. If the source is TimeReplayable, 
> then at-least-once delivery can be guaranteed. In addition, if user state is 
> stored through PersistentState API, then exactly-once delivery is guaranteed. 
> Otherwise, at-most-once delivery is guaranteed. 
> There are several limitations with the current implementation. 
> 1. If users only require at-most-once delivery, message loss track is not 
> necessary and we may get better performance without it. 
> 2. We require user's data source to be TimeReplayable for 
> at-least-once/exactly-once delivery. It would be better if we provide a 
> TimeReplayable wrapper when user source is not replayable (e.g. Twitter)
> 3.  Further, it would be nice if we allow users to switch between the 
> different guarantees through APIs or dashboard.
> This jira is to gather requirements and ideas from the community and users. 
> The real work will be divided into subtasks and committed step by step. 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to