[ https://issues.apache.org/jira/browse/SAMZA-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Riccomini updated SAMZA-252: ---------------------------------- Description: A need with stream processing is to want to re-process prior messages at some later date. An example of this is having a stream processing job that is classifying messages in some way using a machine learning algorithm. At some point, the algorithm will be updated with a more accurate vector of weights. When this happens, usually you wish to re-process past messages to get more accurate results. Usually this is solved by running a parallel pipeline from Hadoop. We have thought extensively about this use case, and should document how to use Samza in a re-processing use case. > Document stream reprocessing > ---------------------------- > > Key: SAMZA-252 > URL: https://issues.apache.org/jira/browse/SAMZA-252 > Project: Samza > Issue Type: Bug > Components: docs > Affects Versions: 0.6.0 > Reporter: Chris Riccomini > Fix For: 0.7.0, 0.8.0 > > > A need with stream processing is to want to re-process prior messages at some > later date. An example of this is having a stream processing job that is > classifying messages in some way using a machine learning algorithm. At some > point, the algorithm will be updated with a more accurate vector of weights. > When this happens, usually you wish to re-process past messages to get more > accurate results. Usually this is solved by running a parallel pipeline from > Hadoop. > We have thought extensively about this use case, and should document how to > use Samza in a re-processing use case. -- This message was sent by Atlassian JIRA (v6.2#6252)