[
https://issues.apache.org/jira/browse/SAMZA-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jagadish reassigned SAMZA-255:
------------------------------
Assignee: Jagadish
> Rewinding Streams within a StreamTask
> -------------------------------------
>
> Key: SAMZA-255
> URL: https://issues.apache.org/jira/browse/SAMZA-255
> Project: Samza
> Issue Type: Wish
> Reporter: Nicolas Bär
> Assignee: Jagadish
> Priority: Minor
>
> The many benefits of Kafka include persistent storage and its resulting
> possibility to rewind streams to a specific offset. Samza does currently not
> support rewinding of streams within a StreamTask. I'd like to place this
> functionality as a feature request and provide two use cases to further
> describe the benefits of such a feature. Let's consider a general use case to
> aggregate values within sliding windows.
> 1. Offline-Processing
> In case of offline-processing the sliding window does not correlate to the
> system time. In this case any node failure will result in samza restoring
> from a checkpointed offset that most probably does not match the beginning of
> the most recent sliding window. But in order to gain precise results, one
> could rewind to the specific offset and process the missing events of the
> sliding window. The same holds for any use case where the data has to be
> processed in small batches and these batches do not correspond to the system
> time.
> 2. Late Arrival
> Messages might get delayed before they are stored into Kafka. In this case
> one could rewind the offset in order to process older messages corresponding
> to the same sliding window.
> I'd be happy to further discuss these cases and the proposed feature request.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)