We recently had an issue that caused us to lose the contents of one of our 
Samza job's checkpoint topics. We were not that concerned about losing the 
checkpointed offsets and so we restarted the job. We then started seeing some 
very strange results and were able to trace it back to the fact that changelog 
paritition mapping changed. We were unaware this data was stored in the 
checkpoint topic. Can someone explain why this mapping is necessary? I was 
under the impression that the number of changelog partitions is identical to 
the number of task instances. If this is so, can't partitions just be assigned 
based on the task number? Assuming the mapping is necessary, it would be nice 
if it was deterministic. Looking at JobCoordinator, it seems to be dependent on 
the order in which things come back in the map produced by the 
SystemStreamPartitionGrouper. This non-determinism seems to have been the cause 
of our issues. Obviously data loss is a problem, but it seems like Samza could 
have recreated the original mapping. Should I file a bug on this?

--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com>
tobec...@tivo.com<mailto:tobec...@tivo.com>

________________________________

This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.

Reply via email to