Chris Riccomini created SAMZA-180:
-------------------------------------

             Summary: Support one-time offset reset for a Samza job
                 Key: SAMZA-180
                 URL: https://issues.apache.org/jira/browse/SAMZA-180
             Project: Samza
          Issue Type: Bug
          Components: container
    Affects Versions: 0.6.0
            Reporter: Chris Riccomini


Samza currently has a systems.%s.streams.%s.samza.reset.offset configuration. 
When set to "true", this configuration tells each SamzaContainer to disregard 
the checkpointed offsets for a stream when starting up. The problem with this 
configuration is that the checkpoints are disregarded every time the 
SamzaContainer starts up, not just the first time. If a host that a 
SamzaContainer is running on fails, and YARN (or some other mechanism) restarts 
the SamzaContainer, the container will not pick up where it left off, but will 
instead disregard the checkpointed offsets, and start over again, as before.

There are some use-cases where developers wish to have a one-time reset of the 
checkpointed offsets. That is, they want to reset the offsets exactly once, but 
then have failures not trigger another reset. This is typically useful in 
bootstrapping cases (related to SAMZA-179), where a developer wishes to reset 
its task back to offset 0, and process all messages up to the head of a stream, 
then shut down. Right now, the developer can set reset.offset=true, and 
auto.offset.reset=smallest (if reprocessing a Kafka topic), but if the 
container ever restarts, processing will begin again from offset 0. This is not 
ideal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to