Mads Mætzke Tandrup created STORM-1028:
------------------------------------------
Summary: Eventhub spout meta data
Key: STORM-1028
URL: https://issues.apache.org/jira/browse/STORM-1028
Project: Apache Storm
Issue Type: Bug
Reporter: Mads Mætzke Tandrup
Event hub (and Kafka) play well into event source architectures as event ingest
point for later Storm processing to downstream stateful consumers.
Advanced event stream processing, such as replaying parts of a stream, requires
that the downstream consumers can synchronise different "stream runs" to their
stateful view, which itself can be seen as an aggregation of all previous
events. To set up the right context for re-processing the stream in a
deterministic way, they need to sync their view with the incoming old data. To
be able to do this, they need knowledge of the event sequenceNumber and
partition.
For example, if you have a bolt that calculates total_order_amount for a stream
of orders, and emits order tuples with the total_order_amount calculated for
all previous orders, replaying an order event should not change
total_order_amount. I.e. orders with a higher sequenceNumber than the order
being processed should not be included in total_order_amount.
This synchronisation can be achieved if the bolt has access to the parition and
sequenceNumber from eventHub.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)