[
https://issues.apache.org/jira/browse/STORM-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730622#comment-14730622
]
ASF GitHub Bot commented on STORM-1028:
---------------------------------------
GitHub user tandrup reopened a pull request:
https://github.com/apache/storm/pull/651
STORM-1028: Eventhub spout meta data
Add "partition","seq-number" fields to emitted tupples for event sourcing
consumers, which need to preserve partition order and be able to replay.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tandrup/storm eventhub-spout-meta-data
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/651.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #651
----
commit d5a7f6081ed65716a6c6fe0121b19597b7d3fc00
Author: rsltrifork <[email protected]>
Date: 2015-07-17T11:27:40Z
Add "partition","seq-number" fields to emitted tupples for event sourcing
consumers, which need to preserve partition order and be able to replay.
Signed-off-by: Mads Mætzke Tandrup <[email protected]>
commit aebe26013792d95caecd9deb285b9fba7df730bd
Author: Mads Mætzke Tandrup <[email protected]>
Date: 2015-07-23T07:43:16Z
Aligning naming of sequence number
commit 8b874265b35a7a6d474e12bbba89d16efe08b459
Author: Mads Mætzke Tandrup <[email protected]>
Date: 2015-07-29T06:16:59Z
Fixing formatting error
----
> Eventhub spout meta data
> ------------------------
>
> Key: STORM-1028
> URL: https://issues.apache.org/jira/browse/STORM-1028
> Project: Apache Storm
> Issue Type: Bug
> Reporter: Mads Mætzke Tandrup
>
> Event hub (and Kafka) play well into event source architectures as event
> ingest point for later Storm processing to downstream stateful consumers.
> Advanced event stream processing, such as replaying parts of a stream,
> requires that the downstream consumers can synchronise different "stream
> runs" to their stateful view, which itself can be seen as an aggregation of
> all previous events. To set up the right context for re-processing the stream
> in a deterministic way, they need to sync their view with the incoming old
> data. To be able to do this, they need knowledge of the event sequenceNumber
> and partition.
> For example, if you have a bolt that calculates total_order_amount for a
> stream of orders, and emits order tuples with the total_order_amount
> calculated for all previous orders, replaying an order event should not
> change total_order_amount. I.e. orders with a higher sequenceNumber than the
> order being processed should not be included in total_order_amount.
> This synchronisation can be achieved if the bolt has access to the parition
> and sequenceNumber from eventHub.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)