[
https://issues.apache.org/jira/browse/SAMZA-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942202#comment-13942202
]
Chris Riccomini commented on SAMZA-180:
---------------------------------------
As [~jghoman] said, this is really more of a batch use case, and I agree that
the least invasive/easiest to delete approach is best, so I support the
CLI-based tool.
Also, a second use case that I think the CLI is really good for is for the
ops/SRE side of things, where something went bad, and we just want to force the
job to go back some number of messages to re-process the "bad" messages again.
This also falls into the (2) use case list.
Re: shell script, yeah it should just live in samza-shell along side the other
run-* scripts.
> Support one-time offset reset for a Samza job
> ---------------------------------------------
>
> Key: SAMZA-180
> URL: https://issues.apache.org/jira/browse/SAMZA-180
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.6.0
> Reporter: Chris Riccomini
> Assignee: Martin Kleppmann
> Attachments: SAMZA-180.1.patch
>
>
> Samza currently has a systems.%s.streams.%s.samza.reset.offset configuration.
> When set to "true", this configuration tells each SamzaContainer to disregard
> the checkpointed offsets for a stream when starting up. The problem with this
> configuration is that the checkpoints are disregarded every time the
> SamzaContainer starts up, not just the first time. If a host that a
> SamzaContainer is running on fails, and YARN (or some other mechanism)
> restarts the SamzaContainer, the container will not pick up where it left
> off, but will instead disregard the checkpointed offsets, and start over
> again, as before.
> There are some use-cases where developers wish to have a one-time reset of
> the checkpointed offsets. That is, they want to reset the offsets exactly
> once, but then have failures not trigger another reset. This is typically
> useful in bootstrapping cases (related to SAMZA-179), where a developer
> wishes to reset its task back to offset 0, and process all messages up to the
> head of a stream, then shut down. Right now, the developer can set
> reset.offset=true, and auto.offset.reset=smallest (if reprocessing a Kafka
> topic), but if the container ever restarts, processing will begin again from
> offset 0. This is not ideal.
--
This message was sent by Atlassian JIRA
(v6.2#6252)