Mads Mætzke Tandrup created STORM-1028:
------------------------------------------

             Summary: Eventhub spout meta data
                 Key: STORM-1028
                 URL: https://issues.apache.org/jira/browse/STORM-1028
             Project: Apache Storm
          Issue Type: Bug
            Reporter: Mads Mætzke Tandrup


Event hub (and Kafka) play well into event source architectures as event ingest 
point for later Storm processing to downstream stateful consumers.

Advanced event stream processing, such as replaying parts of a stream, requires 
that the downstream consumers can synchronise different "stream runs" to their 
stateful view, which itself can be seen as an aggregation of all previous 
events. To set up the right context for re-processing the stream in a 
deterministic way, they need to sync their view with the incoming old data. To 
be able to do this, they need knowledge of the event sequenceNumber and 
partition.

For example, if you have a bolt that calculates total_order_amount for a stream 
of orders, and emits order tuples with the total_order_amount calculated for 
all previous orders, replaying an order event should not change 
total_order_amount. I.e. orders with a higher sequenceNumber than the order 
being processed should not be included in total_order_amount.

This synchronisation can be achieved if the bolt has access to the parition and 
sequenceNumber from eventHub.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to