Ben Kirwin created SAMZA-568:
--------------------------------
Summary: Start offset override in Task init
Key: SAMZA-568
URL: https://issues.apache.org/jira/browse/SAMZA-568
Project: Samza
Issue Type: Improvement
Reporter: Ben Kirwin
Priority: Minor
A couple months back -- [on the mailing list |
http://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201411.mbox/%3ccacux-d_zwzp2emqse4nou76skfh6bkifitzsmnm_b8dxjut...@mail.gmail.com%3E]
-- I mentioned a couple offset management issues I'd been having. (I'm happy
to elaborate on this, but in short: I associate some extra state / ordering
information with the input offsets, and there's a nontrivial performance cost
keeping Samza's checkpoints and my task's state in sync.)
It occurs to me now that there's a simple workaround for this: disable Samza's
checkpointing entirely, and let `StreamTask.init` choose the starting offsets.
The task can just keep its checkpoints in an ordinary StorageEngine -- and by
managing all the state from a single place, it's easy to keep everything in
sync.
The basic implementation actually seems fairly straightforward -- the consumers
are not started until after the tasks are initialized, so all we'd need to do
is allow the `init` method to override the starting offsets. I've attached a
small patch that exposes this through the TaskContext interface, just to
illustrate the idea -- if this seems like an interesting feature for Samza, I'm
happy to add more tests / documentation / etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)