dxichen opened a new pull request #1571:
URL: https://github.com/apache/samza/pull/1571


   Symptom: Checkpoint v2 preference could cause stale checkpoints to get 
picked up when newer v1 checkpoints are available
   
   Cause: Kafka checkpoint manager selects the checkpoints based on the 
`task.checkpoint.read.versions` priority list. This does not account for the 
fact that the prioritized checkpoint may be stale and a new checkpoint lower on 
the priority list is present in the checkpoint topic. This will cause the job 
to silently use the stale checkpoint and cause more reprocessing than the most 
recent commit. 
   
   An example case where this could happen is when the users upgrades to a v2 
checkpoint commits a v2 checkpoint and rollbacks to a previous version using 
only v1 checkpoint version then upgrading to the newest version again, using 
that stale v2 checkpoint from the initial upgrade.
   
   Changes:
   Added a `task.live.checkpoint.max.age` default 10 mins (must be > 
task.commit.ms interval; internal config) which would indicate if a checkpoint 
has gone stale if:
   a) There exists a checkpoint of another type more recent than it
   b) The diff between new checkpoint's kafka log append time and the 
prioritized version's log append time > `task.live.checkpoint.max.age'
   
   Tests: Integration tests writing to the checkpoint topic with stale 
checkpoint v2 and consuming it with a job that prioritize v2 over v1 
checkpoints with `task.live.checkpoint.max.age' = 0 asserting it prioritizes v1 
checkpoints instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to