Github user rsltrifork commented on the pull request:
https://github.com/apache/storm/pull/651#issuecomment-124075611
Event hub (and Kafka) play well into event source architectures as event
ingest point for later Storm processing to downstream stateful consumers.
Advanced event stream processing, such as replaying parts of a stream,
requires that the downstream consumers can synchronise different "stream runs"
to their stateful view, which itself can be seen as an aggregation of all
previous events. To set up the right context for re-processing the stream in a
deterministic way, they need to sync their view with the incoming old data. To
be able to do this, they need knowledge of the event sequenceNumber and
partition.
For example, if you have a bolt that calculates total_order_amount for a
stream of orders, and emits order tuples with the total_order_amount calculated
for all previous orders, replaying an order event should not change
total_order_amount. I.e. orders with a higher sequenceNumber than the order
being processed should not be included in total_order_amount.
This synchronisation can be achieved if the bolt has access to the parition
and sequenceNumber from eventHub.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---