Chris Riccomini created SAMZA-180:
-------------------------------------
Summary: Support one-time offset reset for a Samza job
Key: SAMZA-180
URL: https://issues.apache.org/jira/browse/SAMZA-180
Project: Samza
Issue Type: Bug
Components: container
Affects Versions: 0.6.0
Reporter: Chris Riccomini
Samza currently has a systems.%s.streams.%s.samza.reset.offset configuration.
When set to "true", this configuration tells each SamzaContainer to disregard
the checkpointed offsets for a stream when starting up. The problem with this
configuration is that the checkpoints are disregarded every time the
SamzaContainer starts up, not just the first time. If a host that a
SamzaContainer is running on fails, and YARN (or some other mechanism) restarts
the SamzaContainer, the container will not pick up where it left off, but will
instead disregard the checkpointed offsets, and start over again, as before.
There are some use-cases where developers wish to have a one-time reset of the
checkpointed offsets. That is, they want to reset the offsets exactly once, but
then have failures not trigger another reset. This is typically useful in
bootstrapping cases (related to SAMZA-179), where a developer wishes to reset
its task back to offset 0, and process all messages up to the head of a stream,
then shut down. Right now, the developer can set reset.offset=true, and
auto.offset.reset=smallest (if reprocessing a Kafka topic), but if the
container ever restarts, processing will begin again from offset 0. This is not
ideal.
--
This message was sent by Atlassian JIRA
(v6.2#6252)