[
https://issues.apache.org/jira/browse/SAMZA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361227#comment-14361227
]
Chris Riccomini commented on SAMZA-568:
---------------------------------------
bq. Any thoughts on the API for this?
Wonder if this should be in TaskContext, SamzaContainerContext, or
TaskCoordinator. If we eventually want to allow a task to change offsets from
anywhere (not just during init()), it seems like TaskCooridnator is the better
place for it.
Another approach would be to expose OffsetManager through SamzaContainerContext
(and expose SamzaContainerContext through TaskContext).
In the short term, it seems fairly easy to support this use-case as your patch
has it. As long as we make the Javadocs clear that the API is not stable, and
might change, I think it's fine to move forward with what we've got. A few
docs/tests, and I think we should be good.
> Start offset override in Task init
> ----------------------------------
>
> Key: SAMZA-568
> URL: https://issues.apache.org/jira/browse/SAMZA-568
> Project: Samza
> Issue Type: Improvement
> Components: container
> Affects Versions: 0.9.0
> Reporter: Ben Kirwin
> Priority: Minor
> Attachments:
> 0001-Allow-overriding-starting-offsets-in-TaskContext.patch
>
>
> A couple months back -- [on the mailing list |
> http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201411.mbox/%3ccacux-d_zwzp2emqse4nou76skfh6bkifitzsmnm_b8dxjut...@mail.gmail.com%3E]
> -- I mentioned a couple offset management issues I'd been having. (I'm happy
> to elaborate on this, but in short: I associate some extra state / ordering
> information with the input offsets, and there's a nontrivial performance cost
> keeping Samza's checkpoints and my task's state in sync.)
> It occurs to me now that there's a simple workaround for this: disable
> Samza's checkpointing entirely, and let `StreamTask.init` choose the starting
> offsets. The task can just keep its checkpoints in an ordinary StorageEngine
> -- and by managing all the state from a single place, it's easy to keep
> everything in sync.
> The basic implementation actually seems fairly straightforward -- the
> consumers are not started until after the tasks are initialized, so all we'd
> need to do is allow the `init` method to override the starting offsets. I've
> attached a small patch that exposes this through the TaskContext interface,
> just to illustrate the idea -- if this seems like an interesting feature for
> Samza, I'm happy to add more tests / documentation / etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)