[ https://issues.apache.org/jira/browse/SAMZA-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Kleppmann updated SAMZA-252: ----------------------------------- Attachment: SAMZA-252.2.patch Thanks for your helpful feedback. I've reworked and extended the page, and updated the RB. Also attached updated (v2) patch, rebased onto master. > Document stream reprocessing > ---------------------------- > > Key: SAMZA-252 > URL: https://issues.apache.org/jira/browse/SAMZA-252 > Project: Samza > Issue Type: Improvement > Components: docs > Affects Versions: 0.6.0 > Reporter: Chris Riccomini > Assignee: Martin Kleppmann > Fix For: 0.7.0 > > Attachments: SAMZA-252.1.patch, SAMZA-252.2.patch > > > A need with stream processing is to want to re-process prior messages at some > later date. An example of this is having a stream processing job that is > classifying messages in some way using a machine learning algorithm. At some > point, the algorithm will be updated with a more accurate vector of weights. > When this happens, usually you wish to re-process past messages to get more > accurate results. Usually this is solved by running a parallel pipeline from > Hadoop. > We have thought extensively about this use case, and should document how to > use Samza in a re-processing use case. -- This message was sent by Atlassian JIRA (v6.2#6252)