[ 
https://issues.apache.org/jira/browse/STORM-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730622#comment-14730622
 ] 

ASF GitHub Bot commented on STORM-1028:
---------------------------------------

GitHub user tandrup reopened a pull request:

    https://github.com/apache/storm/pull/651

    STORM-1028: Eventhub spout meta data

    Add "partition","seq-number" fields to emitted tupples for event sourcing 
consumers, which need to preserve partition order and be able to replay.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tandrup/storm eventhub-spout-meta-data

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/651.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #651
    
----
commit d5a7f6081ed65716a6c6fe0121b19597b7d3fc00
Author: rsltrifork <[email protected]>
Date:   2015-07-17T11:27:40Z

    Add "partition","seq-number" fields to emitted tupples for event sourcing 
consumers, which need to preserve partition order and be able to replay.
    
    Signed-off-by: Mads Mætzke Tandrup <[email protected]>

commit aebe26013792d95caecd9deb285b9fba7df730bd
Author: Mads Mætzke Tandrup <[email protected]>
Date:   2015-07-23T07:43:16Z

    Aligning naming of sequence number

commit 8b874265b35a7a6d474e12bbba89d16efe08b459
Author: Mads Mætzke Tandrup <[email protected]>
Date:   2015-07-29T06:16:59Z

    Fixing formatting error

----


> Eventhub spout meta data
> ------------------------
>
>                 Key: STORM-1028
>                 URL: https://issues.apache.org/jira/browse/STORM-1028
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Mads Mætzke Tandrup
>
> Event hub (and Kafka) play well into event source architectures as event 
> ingest point for later Storm processing to downstream stateful consumers.
> Advanced event stream processing, such as replaying parts of a stream, 
> requires that the downstream consumers can synchronise different "stream 
> runs" to their stateful view, which itself can be seen as an aggregation of 
> all previous events. To set up the right context for re-processing the stream 
> in a deterministic way, they need to sync their view with the incoming old 
> data. To be able to do this, they need knowledge of the event sequenceNumber 
> and partition.
> For example, if you have a bolt that calculates total_order_amount for a 
> stream of orders, and emits order tuples with the total_order_amount 
> calculated for all previous orders, replaying an order event should not 
> change total_order_amount. I.e. orders with a higher sequenceNumber than the 
> order being processed should not be included in total_order_amount.
> This synchronisation can be achieved if the bolt has access to the parition 
> and sequenceNumber from eventHub.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to