[ 
https://issues.apache.org/jira/browse/SAMZA-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagadish reassigned SAMZA-255:
------------------------------

    Assignee: Jagadish

> Rewinding Streams within a StreamTask
> -------------------------------------
>
>                 Key: SAMZA-255
>                 URL: https://issues.apache.org/jira/browse/SAMZA-255
>             Project: Samza
>          Issue Type: Wish
>            Reporter: Nicolas Bär
>            Assignee: Jagadish
>            Priority: Minor
>
> The many benefits of Kafka include persistent storage and its resulting 
> possibility to rewind streams to a specific offset. Samza does currently not 
> support rewinding of streams within a StreamTask. I'd like to place this 
> functionality as a feature request and provide two use cases to further 
> describe the benefits of such a feature. Let's consider a general use case to 
> aggregate values within sliding windows.
> 1. Offline-Processing
> In case of offline-processing the sliding window does not correlate to the 
> system time. In this case any node failure will result in samza restoring 
> from a checkpointed offset that most probably does not match the beginning of 
> the most recent sliding window. But in order to gain precise results, one 
> could rewind to the specific offset and process the missing events of the 
> sliding window. The same holds for any use case where the data has to be 
> processed in small batches and these batches do not correspond to the system 
> time.
> 2. Late Arrival
> Messages might get delayed before they are stored into Kafka. In this case 
> one could rewind the offset in order to process older messages corresponding 
> to the same sliding window.
> I'd be happy to further discuss these cases and the proposed feature request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to