[ https://issues.apache.org/jira/browse/SAMZA-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Kleppmann updated SAMZA-252: ----------------------------------- Attachment: SAMZA-252.1.patch Here's a draft for a page on reprocessing for the docs: https://reviews.apache.org/r/20989/ > Document stream reprocessing > ---------------------------- > > Key: SAMZA-252 > URL: https://issues.apache.org/jira/browse/SAMZA-252 > Project: Samza > Issue Type: Improvement > Components: docs > Affects Versions: 0.6.0 > Reporter: Chris Riccomini > Assignee: Martin Kleppmann > Fix For: 0.7.0, 0.8.0 > > Attachments: SAMZA-252.1.patch > > > A need with stream processing is to want to re-process prior messages at some > later date. An example of this is having a stream processing job that is > classifying messages in some way using a machine learning algorithm. At some > point, the algorithm will be updated with a more accurate vector of weights. > When this happens, usually you wish to re-process past messages to get more > accurate results. Usually this is solved by running a parallel pipeline from > Hadoop. > We have thought extensively about this use case, and should document how to > use Samza in a re-processing use case. -- This message was sent by Atlassian JIRA (v6.2#6252)