Manu Zhang created GEARPUMP-33:
----------------------------------
Summary: Message Delivery Guarantee
Key: GEARPUMP-33
URL: https://issues.apache.org/jira/browse/GEARPUMP-33
Project: Apache Gearpump
Issue Type: Improvement
Components: streaming
Reporter: Manu Zhang
The original discussions are at
[https://github.com/gearpump/gearpump/issues/1528] and
[https://github.com/gearpump/gearpump/issues/354].
When a message flows through a stream processing system, the system will try to
provide some guarantee on message delivery From the weakest to strongest, there
are.
# At most once delivery
a message is processed zero or one times. Messages can be lost.
# At least once delivery
a message is processed one or more times such that at least one of them
succeeds. Messages can not be lost but can be duplicated.
# Exactly once delivery
a message is processed exactly once. Messages can neither be lost nor
duplicated.
Gearpump tracks message loss between a sender Task and a receiver Task and
replays the application on message loss. If the source is TimeReplayable, then
at-least-once delivery can be guaranteed. In addition, if user state is stored
through PersistentState API, then exactly-once delivery is guaranteed.
Otherwise, at-most-once delivery is guaranteed.
There are several limitations with the current implementation.
1. If users only require at-most-once delivery, message loss track is not
necessary and we may get better performance without it.
2. We require user's data source to be TimeReplayable for
at-least-once/exactly-once delivery. It would be better if we provide a
TimeReplayable wrapper when user source is not replayable (e.g. Twitter)
3. Further, it will be nice if we allow users to switch between the different
guarantees through APIs or dashboard.
This jira is to gather requirements and ideas from the community and users. The
real work will be divided into subtasks and committed step by step.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)