----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20989/#review42108 -----------------------------------------------------------
This is good. As a high-level comment I think it might be good to introduce this topic by saying that there are several ways to handle reprocessing, give each a name, and then discuss pros and cons and document the checkpoint stuff. I think it would be worth covering: 1. "Simple rewind": Just delete your state and change the job checkpoint to 0. (pro: super simple, con: some downtime). 2. "Parallel rewind": Restart a second copy of the job going to a new output topic in parallel. Switch consumers when caught up. Alternately can share output topics if the consumers can handle that. (pro: still pretty simple, con: need to change consumer too). 3. Lambda architecture: Reimplement job in Hadoop and do the reprocessing there (downside: need to somehow maintain logic and operate in two systems). - Jay Kreps On May 1, 2014, 10:14 p.m., Martin Kleppmann wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/20989/ > ----------------------------------------------------------- > > (Updated May 1, 2014, 10:14 p.m.) > > > Review request for samza. > > > Repository: samza > > > Description > ------- > > SAMZA-252: Add page on reprocessing to the docs. > > > Diffs > ----- > > docs/_layouts/default.html 0a5ad9f63110c68424360773b9fcd005e4a059a9 > docs/learn/documentation/0.7.0/index.html > 7806baf71bee61e5316d5bc627fee219012d3375 > docs/learn/documentation/0.7.0/jobs/logging.md > 6bb6bf4b3630165159acc47e4cfb8e1afe6659cb > docs/learn/documentation/0.7.0/jobs/reprocessing.md PRE-CREATION > > Diff: https://reviews.apache.org/r/20989/diff/ > > > Testing > ------- > > > Thanks, > > Martin Kleppmann > >
