[ 
https://issues.apache.org/jira/browse/SAMZA-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942118#comment-13942118
 ] 

Jakob Homan commented on SAMZA-180:
-----------------------------------

Been thinking about this some more.  All of the solutions above seem a bit 
hacky because the use case goes against what Samza is offering right now: 
static configuration after the job starts up (and any config rewriters have had 
their say).   I had thought that maybe 'one-time' configs could be introduced 
that would be fed to the first instance of a container but not subsequent ones. 
 But since there is no AM-Samza Container communication after the container 
starts there is no way for a container to signal to the AM that it's done what 
it needs to do to be ready for those 'next-time' configs.

Fundamentally, this use case isn't a streaming one, but rather a batch one, and 
hence will always be awkward in Samza.  As such, the tool described above may 
be the best bet as it's the least invasive into the framework and easiest to 
remove once there's a better approach to be had.

> Support one-time offset reset for a Samza job
> ---------------------------------------------
>
>                 Key: SAMZA-180
>                 URL: https://issues.apache.org/jira/browse/SAMZA-180
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>            Assignee: Martin Kleppmann
>         Attachments: SAMZA-180.1.patch
>
>
> Samza currently has a systems.%s.streams.%s.samza.reset.offset configuration. 
> When set to "true", this configuration tells each SamzaContainer to disregard 
> the checkpointed offsets for a stream when starting up. The problem with this 
> configuration is that the checkpoints are disregarded every time the 
> SamzaContainer starts up, not just the first time. If a host that a 
> SamzaContainer is running on fails, and YARN (or some other mechanism) 
> restarts the SamzaContainer, the container will not pick up where it left 
> off, but will instead disregard the checkpointed offsets, and start over 
> again, as before.
> There are some use-cases where developers wish to have a one-time reset of 
> the checkpointed offsets. That is, they want to reset the offsets exactly 
> once, but then have failures not trigger another reset. This is typically 
> useful in bootstrapping cases (related to SAMZA-179), where a developer 
> wishes to reset its task back to offset 0, and process all messages up to the 
> head of a stream, then shut down. Right now, the developer can set 
> reset.offset=true, and auto.offset.reset=smallest (if reprocessing a Kafka 
> topic), but if the container ever restarts, processing will begin again from 
> offset 0. This is not ideal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to