-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20989/#review42108
-----------------------------------------------------------


This is good. As a high-level comment I think it might be good to introduce 
this topic by saying that there are several ways to handle reprocessing, give 
each a name, and then discuss pros and cons and document the checkpoint stuff. 
I think it would be worth covering:
1. "Simple rewind": Just delete your state and change the job checkpoint to 0. 
(pro: super simple, con: some downtime).
2. "Parallel rewind": Restart a second copy of the job going to a new output 
topic in parallel. Switch consumers when caught up. Alternately can share 
output topics if the consumers can handle that. (pro: still pretty simple, con: 
need to change consumer too).
3. Lambda architecture: Reimplement job in Hadoop and do the reprocessing there 
(downside: need to somehow maintain logic and operate in two systems).

- Jay Kreps


On May 1, 2014, 10:14 p.m., Martin Kleppmann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20989/
> -----------------------------------------------------------
> 
> (Updated May 1, 2014, 10:14 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> SAMZA-252: Add page on reprocessing to the docs.
> 
> 
> Diffs
> -----
> 
>   docs/_layouts/default.html 0a5ad9f63110c68424360773b9fcd005e4a059a9 
>   docs/learn/documentation/0.7.0/index.html 
> 7806baf71bee61e5316d5bc627fee219012d3375 
>   docs/learn/documentation/0.7.0/jobs/logging.md 
> 6bb6bf4b3630165159acc47e4cfb8e1afe6659cb 
>   docs/learn/documentation/0.7.0/jobs/reprocessing.md PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/20989/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Martin Kleppmann
> 
>

Reply via email to